第17行: |
第17行: |
| | | |
| ==定义== | | ==定义== |
− | 大数据这个词从20世纪90年代开始使用,一些人认为是约翰·马西 John Mashey推广了这个词。<ref>{{Cite web |title= Big Data ... and the Next Wave of InfraStress |author= John R. Mashey |date= 25 April 1998 |publisher= Usenix |work= Slides from invited talk |url= http://static.usenix.org/event/usenix99/invited_talks/mashey.pdf |access-date= 28 September 2016 }}</ref><ref>{{cite news|title=The Origins of 'Big Data': An Etymological Detective Story |author=Steve Lohr |date= 1 February 2013 |url=http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/ |work= [[The New York Times]] |access-date= 28 September 2016 }}</ref>大数据通常包括大小超出常用软件工具能力(在可承受的时间内捕获、管理和处理数据)的数据集。<ref name="Editorial">{{cite journal | last1 = Snijders | first1 = C. | last2 = Matzat | first2 = U. | last3 = Reips | first3 = U.-D. | year = 2012 | title = 'Big Data': Big gaps of knowledge in the field of Internet | url = http://www.ijis.net/ijis7_1/ijis7_1_editorial.html | journal = International Journal of Internet Science | volume = 7 | pages = 1–5 }}</ref> 大数据包括非结构化、半结构化和结构化数据,但主要关注非结构化数据。<ref name="Springer 2017">{{cite book |chapter=Towards Differentiating Business Intelligence, Big Data, Data Analytics and Knowledge Discovery |last1=Dedić |first1=N. |title=Innovations in Enterprise Information Systems Management and Engineering |last2=Stanier |first2=C. |issn=1865-1356 |oclc=909580101 |publisher=Springer International Publishing |location=Berlin ; Heidelberg |year=2017 |volume= 285|pages=114–122 |doi=10.1007/978-3-319-58801-8_10 |series=Lecture Notes in Business Information Processing |isbn=978-3-319-58800-1 |chapter-url=http://eprints.staffs.ac.uk/3551/1/Towards%20Differentiating%20Business%20Intelligence%20Big%20Data%20Data%20Analytics%20and%20Knowldge%20Discovery.docx }}</ref>大数据的“规模”是一个比较灵活的衡量标准;从几十兆字节到许多兆字节的数据。<ref name="Everts">{{cite magazine|last1=Everts |first1=Sarah |title=Information Overload |magazine=[[Distillations (magazine)|Distillations]] |date=2016| volume=2|issue=2|pages=26–33|url =https://www.sciencehistory.org/distillations/magazine/information-overload| access-date=22 March 2018}}</ref>大数据需要一套具有新集成技术来处理多样化、复杂和大规模的数据集。<ref>{{cite journal | last1 = Ibrahim | last2 = Targio Hashem | first2 = Abaker | last3 = Yaqoob | first3 = Ibrar | last4 = Badrul Anuar | first4 = Nor | last5 = Mokhtar | first5 = Salimah | last6 = Gani | first6 = Abdullah | last7 = Ullah Khan | first7 = Samee | year = 2015 | title = big data" on cloud computing: Review and open research issues | journal = Information Systems | volume = 47 | pages = 98–115 | doi = 10.1016/j.is.2014.07.006 }}</ref> | + | 大数据这个词从20世纪90年代开始使用,一些人认为是约翰·马西 John Mashey推广了这个词。<ref>{{Cite web |title= Big Data ... and the Next Wave of InfraStress |author= John R. Mashey |date= 25 April 1998 |publisher= Usenix |work= Slides from invited talk |url= http://static.usenix.org/event/usenix99/invited_talks/mashey.pdf |access-date= 28 September 2016 }}</ref><ref>{{cite news|title=The Origins of 'Big Data': An Etymological Detective Story |author=Steve Lohr |date= 1 February 2013 |url=http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/ |work= [[The New York Times]] |access-date= 28 September 2016 }}</ref>大数据通常包括大小超出常用软件工具能力(在可承受的时间内捕获、管理和处理数据)的数据集。<ref name="Editorial">{{cite journal | last1 = Snijders | first1 = C. | last2 = Matzat | first2 = U. | last3 = Reips | first3 = U.-D. | year = 2012 | title = 'Big Data': Big gaps of knowledge in the field of Internet | url = http://www.ijis.net/ijis7_1/ijis7_1_editorial.html | journal = International Journal of Internet Science | volume = 7 | pages = 1–5 }}</ref> 大数据包括非结构化、半结构化和结构化数据,但主要关注非结构化数据。<ref name="Springer 2017">{{cite book |chapter=Towards Differentiating Business Intelligence, Big Data, Data Analytics and Knowledge Discovery |last1=Dedić |first1=N. |title=Innovations in Enterprise Information Systems Management and Engineering |last2=Stanier |first2=C. |issn=1865-1356 |oclc=909580101 |publisher=Springer International Publishing |location=Berlin ; Heidelberg |year=2017 |volume= 285|pages=114–122 |doi=10.1007/978-3-319-58801-8_10 |series=Lecture Notes in Business Information Processing |isbn=978-3-319-58800-1 |chapter-url=http://eprints.staffs.ac.uk/3551/1/Towards%20Differentiating%20Business%20Intelligence%20Big%20Data%20Data%20Analytics%20and%20Knowldge%20Discovery.docx }}</ref> |
| | | |
| | | |
− | 一些组织增加了“多样性”、“准确性”和其他各种“V”开头的字母来描述它,但这一修订受到了一些行业权威的质疑。<ref>{{cite magazine|last=Grimes|first=Seth|title=Big Data: Avoid 'Wanna V' Confusion| url=http://www.informationweek.com/big-data/big-data-analytics/big-data-avoid-wanna-v-confusion/d/d-id/1111077|magazine=[[InformationWeek]]|access-date = 5 January 2016}}</ref>大数据的V通常被称为三V、四V和V。它们代表了大数据的大数量、多样性、速度、准确性和价值(volume, variety, velocity, veracity, and value)。<ref name=":0">{{Cite web|date=2016-09-17|title=The 5 V's of big data|url=https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data/|access-date=2021-01-20|website=Watson Health Perspectives|language=en-US}}</ref> 可变性通常被视为大数据的额外属性。
| + | 从比较定义(Comparative Definition)的角度,大数据的数据量大小超出常规的数据库工具获取、存储、管理和分析能力的数据量。这是一种演化观点,数据量随时间和部门而变。总体而言,大数据的“规模”是一个比较灵活的衡量标准;从几十兆字节到许多兆字节的数据。<ref name="Everts">{{cite magazine|last1=Everts |first1=Sarah |title=Information Overload |magazine=[[Distillations (magazine)|Distillations]] |date=2016| volume=2|issue=2|pages=26–33|url =https://www.sciencehistory.org/distillations/magazine/information-overload| access-date=22 March 2018}}</ref>大数据需要一套具有新集成技术来处理多样化、复杂和大规模的数据集。<ref>{{cite journal | last1 = Ibrahim | last2 = Targio Hashem | first2 = Abaker | last3 = Yaqoob | first3 = Ibrar | last4 = Badrul Anuar | first4 = Nor | last5 = Mokhtar | first5 = Salimah | last6 = Gani | first6 = Abdullah | last7 = Ullah Khan | first7 = Samee | year = 2015 | title = big data" on cloud computing: Review and open research issues | journal = Information Systems | volume = 47 | pages = 98–115 | doi = 10.1016/j.is.2014.07.006 }}</ref> |
| | | |
| | | |
− | 2018年的一项定义指出,“大数据技术是需要并行计算工具来处理数据的”,并指出,“这代表了通过并行编程理论使用的计算机科学发生了一个明显而清晰的变化,以及丧失了Codd的关系型数据库的一些保障和功能。”<ref>{{Cite book|last=Fox|first=Charles|date=25 March 2018|title=Data Science for Transport| url=https://www.springer.com/us/book/9783319729527|publisher=Springer|isbn=9783319729527|series=Springer Textbooks in Earth Sciences, Geography and Environment}}</ref>
| + | 从属性定义(Attribute Definition)角度,大数据描述了一个技术和体系的新时代,被设计于从大规模多样化的数据中通过高速获取、发现和分析技术提取数据的价值。一些组织增加了“多样性”、“准确性”和其他各种“V”开头的字母来描述它,但这一修订受到了一些行业权威的质疑。<ref>{{cite magazine|last=Grimes|first=Seth|title=Big Data: Avoid 'Wanna V' Confusion| url=http://www.informationweek.com/big-data/big-data-analytics/big-data-avoid-wanna-v-confusion/d/d-id/1111077|magazine=[[InformationWeek]]|access-date = 5 January 2016}}</ref>大数据的V通常被称为三V、四V和五V。它们代表了大数据的大数量、多样性、速度、准确性和价值(volume, variety, velocity, veracity, and value)。<ref name=":0">{{Cite web|date=2016-09-17|title=The 5 V's of big data|url=https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data/|access-date=2021-01-20|website=Watson Health Perspectives|language=en-US}}</ref> 可变性通常被视为大数据的额外属性。 |
| + | |
| + | |
| + | 从体系定义(Architecture Definition)角度,数据质量、数据采集或数据表示限制了使用传统关系型方法进行有效分析的能力,需要使用水平扩展的机制来实现高效处理的数据。 |
| + | 2018年的一项定义指出,“大数据技术是需要并行计算工具来处理数据的”,并指出,“这代表了通过并行编程理论使用的计算机科学发生了一个明显而清晰的变化,以及丧失了关系型数据库的一些保障和功能。”<ref>{{Cite book|last=Fox|first=Charles|date=25 March 2018|title=Data Science for Transport| url=https://www.springer.com/us/book/9783319729527|publisher=Springer|isbn=9783319729527|series=Springer Textbooks in Earth Sciences, Geography and Environment}}</ref> |
| + | |
| | | |
| | | |
| 在一项大数据集的对比研究中,Kitchin和McArdle发现,在所有分析案例中,大数据的常见特征并不都一致。<ref>{{cite journal | last1 = Kitchin | first1 = Rob | last2 = McArdle | first2 = Gavin | year = 2016 | title = What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets | journal = Big Data & Society | volume = 3 | pages = 1–10 | doi = 10.1177/2053951716631130}}</ref>因此,其他研究将知识发现中权力动力学的重新定义确定为知识发现的定义特征。<ref>{{cite journal | last1 = Balazka | first1 = Dominik | last2 = Rodighiero | first2 = Dario | year = 2020 | title = Big Data and the Little Big Bang: An Epistemological (R)evolution | journal = Frontiers in Big Data | volume = 3 | page = 31 | doi = 10.3389/fdata.2020.00031 | pmc = 7931920 | hdl = 1721.1/128865 | hdl-access = free | doi-access = free }}</ref>这种另类视角没有关注大数据的内在特征,而是推动了对对象的关系理解,声称重要的是数据的收集、存储、可用和分析方式。 | | 在一项大数据集的对比研究中,Kitchin和McArdle发现,在所有分析案例中,大数据的常见特征并不都一致。<ref>{{cite journal | last1 = Kitchin | first1 = Rob | last2 = McArdle | first2 = Gavin | year = 2016 | title = What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets | journal = Big Data & Society | volume = 3 | pages = 1–10 | doi = 10.1177/2053951716631130}}</ref>因此,其他研究将知识发现中权力动力学的重新定义确定为知识发现的定义特征。<ref>{{cite journal | last1 = Balazka | first1 = Dominik | last2 = Rodighiero | first2 = Dario | year = 2020 | title = Big Data and the Little Big Bang: An Epistemological (R)evolution | journal = Frontiers in Big Data | volume = 3 | page = 31 | doi = 10.3389/fdata.2020.00031 | pmc = 7931920 | hdl = 1721.1/128865 | hdl-access = free | doi-access = free }}</ref>这种另类视角没有关注大数据的内在特征,而是推动了对对象的关系理解,声称重要的是数据的收集、存储、可用和分析方式。 |
− |
| |
| | | |
| === 大数据vs商业智能 === | | === 大数据vs商业智能 === |
第378行: |
第382行: |
| | | |
| | | |
− | === 针对“ v”模型的批评 === | + | === 针对“ V”模型的批评 === |
| 大数据的“V”模型令人担忧,因为它以计算的可延展性为中心,缺乏信息的可感知性和可理解性。这导致了认知大数据框架的形成,该框架根据以下特点描述了大数据应用:<ref>{{Cite journal|last1=Lugmayr|first1=Artur|last2=Stockleben|first2=Bjoern|last3=Scheib|first3=Christoph|last4=Mailaparampil|first4=Mathew|last5=Mesia|first5=Noora|last6=Ranta|first6=Hannu|last7=Lab|first7=Emmi|date=1 June 2016|title=A Comprehensive Survey On Big-Data Research and Its Implications – What is Really 'New' in Big Data? – It's Cognitive Big Data! |url=https://www.researchgate.net/publication/304784955}}</ref> | | 大数据的“V”模型令人担忧,因为它以计算的可延展性为中心,缺乏信息的可感知性和可理解性。这导致了认知大数据框架的形成,该框架根据以下特点描述了大数据应用:<ref>{{Cite journal|last1=Lugmayr|first1=Artur|last2=Stockleben|first2=Bjoern|last3=Scheib|first3=Christoph|last4=Mailaparampil|first4=Mathew|last5=Mesia|first5=Noora|last6=Ranta|first6=Hannu|last7=Lab|first7=Emmi|date=1 June 2016|title=A Comprehensive Survey On Big-Data Research and Its Implications – What is Really 'New' in Big Data? – It's Cognitive Big Data! |url=https://www.researchgate.net/publication/304784955}}</ref> |
| | | |