第319行: |
第319行: |
| | | |
| | | |
− | 该举措包括一个国家科学基金会“计算远征”,该项目将在五年内向加州大学伯克利分校的 AMPLab 提供1000万美元的资助。<ref>{{cite web|url=http://amplab.cs.berkeley.edu |title=AMPLab at the University of California, Berkeley |publisher=Amplab.cs.berkeley.edu |access-date=5 March 2013}}</ref> at the University of California, Berkeley.<ref>{{cite web |title=NSF Leads Federal Efforts in Big Data|date=29 March 2012|publisher=National Science Foundation (NSF) |url= https://www.nsf.gov/news/news_summ.jsp?cntn_id=123607&org=NSF&from=news}}</ref>AMPLab还从DARPA和十几家行业赞助商那里获得资金,并利用大数据解决从预测交通拥堵<ref>{{cite conference| url=https://amplab.cs.berkeley.edu/publication/scaling-the-mobile-millennium-system-in-the-cloud-2/|author1=Timothy Hunter|date=October 2011|author2=Teodor Moldovan|author3=Matei Zaharia| author4 =Justin Ma|author5=Michael Franklin|author6-link=Pieter Abbeel|author6=Pieter Abbeel|author7=Alexandre Bayen |title=Scaling the Mobile Millennium System in the Cloud}}</ref>到抗击癌症等一系列问题。<ref>{{cite news|title=Computer Scientists May Have What It Takes to Help Cure Cancer|author=David Patterson|work=The New York Times| date=5 December 2011 |url=https://www.nytimes.com/2011/12/06/science/david-patterson-enlist-computer-scientists-in-cancer-fight.html}}</ref> | + | 该举措包括一个国家科学基金会“计算远征”,该项目将在五年内向加州大学伯克利分校的 AMPLab 提供1000万美元的资助。<ref>{{cite web|url=http://amplab.cs.berkeley.edu |title=AMPLab at the University of California, Berkeley |publisher=Amplab.cs.berkeley.edu |access-date=5 March 2013}}</ref> at the University of California, Berkeley.<ref>{{cite web |title=NSF Leads Federal Efforts in Big Data|date=29 March 2012|publisher=National Science Foundation (NSF) |url= https://www.nsf.gov/news/news_summ.jsp?cntn_id=123607&org=NSF&from=news}}</ref>AMPLab还从DARPA和十几家行业赞助商那里获得资金,并利用大数据解决从预测交通拥堵<ref>{{cite conference| url=https://amplab.cs.berkeley.edu/publication/scaling-the-mobile-millennium-system-in-the-cloud-2/|author1=Timothy Hunter|date=October 2011|author2=Teodor Moldovan|author3=Matei Zaharia| author4 =Justin Ma|author5=Michael Franklin|author6=Pieter Abbeel|author7=Alexandre Bayen |title=Scaling the Mobile Millennium System in the Cloud}}</ref>到抗击癌症等一系列问题。<ref>{{cite news|title=Computer Scientists May Have What It Takes to Help Cure Cancer|author=David Patterson|work=The New York Times| date=5 December 2011 |url=https://www.nytimes.com/2011/12/06/science/david-patterson-enlist-computer-scientists-in-cancer-fight.html}}</ref> |
| | | |
| | | |
第340行: |
第340行: |
| | | |
| | | |
− | Tobias Preis和他的同事Helen Susannah Moat和H.Eugene Stanley介绍了一种方法,使用基于谷歌趋势(Google Trends)提供的搜索量数据的交易策略,识别股市走势的在线前兆。他们在科学报告中对谷歌98个不同财务相关性的搜索量进行的分析表明,财务相关搜索量的增加往往先于金融市场的巨大损失。<ref>{{cite journal | url =http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball | journal=Nature | date=26 April 2013 | doi=10.1038/nature.2013.12879 | access-date=9 August 2013| author-link=Philip Ball }}</ref> Their analysis of [[Google]] search volume for 98 terms of varying financial relevance, published in ''[[Scientific Reports]]'',<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE | title = Quantifying trading behavior in financial markets using Google Trends | journal = Scientific Reports | volume = 3 | pages = 1684 | year = 2013 | pmid = 23619126 | pmc = 3635219 | doi = 10.1038/srep01684 | bibcode = 2013NatSR...3E1684P }}</ref> suggests that increases in search volume for financially relevant search terms tend to precede large losses in financial markets.<ref>{{cite news | url=http://bits.blogs.nytimes.com/2013/04/26/google-search-terms-can-predict-stock-market-study-finds/ | title= Google Search Terms Can Predict Stock Market, Study Finds | author=Nick Bilton | work=[[The New York Times]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite magazine | url=http://business.time.com/2013/04/26/trouble-with-your-investment-portfolio-google-it/ | title=Trouble With Your Investment Portfolio? Google It! | author=Christopher Matthews | magazine=[[Time (magazine)|Time]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite journal | url= http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball |journal=[[Nature (journal)|Nature]] | date=26 April 2013 | doi=10.1038/nature.2013.12879 | access-date=9 August 2013}}</ref><ref>{{cite news | url=http://www.businessweek.com/articles/2013-04-25/big-data-researchers-turn-to-google-to-beat-the-markets | title='Big Data' Researchers Turn to Google to Beat the Markets | author=Bernhard Warner | work=[[Bloomberg Businessweek]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url=https://www.independent.co.uk/news/business/comment/hamish-mcrae/hamish-mcrae-need-a-valuable-handle-on-investor-sentiment-google-it-8590991.html | title=Hamish McRae: Need a valuable handle on investor sentiment? Google it | author=Hamish McRae | work=[[The Independent]] | date=28 April 2013 | access-date=9 August 2013 | location=London}}</ref><ref>{{cite web | url=http://www.ft.com/intl/cms/s/0/e5d959b8-acf2-11e2-b27f-00144feabdc0.html | title= Google search proves to be new word in stock market prediction | author=Richard Waters | work=[[Financial Times]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url =https://www.bbc.co.uk/news/science-environment-22293693 | title=Google searches predict market moves | author=Jason Palmer | work=[[BBC]] | date=25 April 2013 | access-date=9 August 2013}}</ref> | + | Tobias Preis和他的同事Helen Susannah Moat和H.Eugene Stanley介绍了一种方法,使用基于谷歌趋势(Google Trends)提供的搜索量数据的交易策略,识别股市走势的在线前兆。他们在科学报告中对谷歌98个不同财务相关性的搜索量进行的分析表明,财务相关搜索量的增加往往先于金融市场的巨大损失。<ref>{{cite journal | url =http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball | journal=Nature | date=26 April 2013 | doi=10.1038/nature.2013.12879 | access-date=9 August 2013}}</ref> Their analysis of [[Google]] search volume for 98 terms of varying financial relevance, published in ''[[Scientific Reports]]'',<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE | title = Quantifying trading behavior in financial markets using Google Trends | journal = Scientific Reports | volume = 3 | pages = 1684 | year = 2013 | pmid = 23619126 | pmc = 3635219 | doi = 10.1038/srep01684 | bibcode = 2013NatSR...3E1684P }}</ref> suggests that increases in search volume for financially relevant search terms tend to precede large losses in financial markets.<ref>{{cite news | url=http://bits.blogs.nytimes.com/2013/04/26/google-search-terms-can-predict-stock-market-study-finds/ | title= Google Search Terms Can Predict Stock Market, Study Finds | author=Nick Bilton | work=[[The New York Times]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite magazine | url=http://business.time.com/2013/04/26/trouble-with-your-investment-portfolio-google-it/ | title=Trouble With Your Investment Portfolio? Google It! | author=Christopher Matthews | magazine=[[Time (magazine)|Time]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite journal | url= http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball |journal=[[Nature (journal)|Nature]] | date=26 April 2013 | doi=10.1038/nature.2013.12879 | access-date=9 August 2013}}</ref><ref>{{cite news | url=http://www.businessweek.com/articles/2013-04-25/big-data-researchers-turn-to-google-to-beat-the-markets | title='Big Data' Researchers Turn to Google to Beat the Markets | author=Bernhard Warner | work=[[Bloomberg Businessweek]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url=https://www.independent.co.uk/news/business/comment/hamish-mcrae/hamish-mcrae-need-a-valuable-handle-on-investor-sentiment-google-it-8590991.html | title=Hamish McRae: Need a valuable handle on investor sentiment? Google it | author=Hamish McRae | work=[[The Independent]] | date=28 April 2013 | access-date=9 August 2013 | location=London}}</ref><ref>{{cite web | url=http://www.ft.com/intl/cms/s/0/e5d959b8-acf2-11e2-b27f-00144feabdc0.html | title= Google search proves to be new word in stock market prediction | author=Richard Waters | work=[[Financial Times]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url =https://www.bbc.co.uk/news/science-environment-22293693 | title=Google searches predict market moves | author=Jason Palmer | work=[[BBC]] | date=25 April 2013 | access-date=9 August 2013}}</ref> |
| | | |
| | | |
第353行: |
第353行: |
| 关于大数据集的一个研究问题是,是否有必要查看完整的数据或者样本要足够好,以得出关于数据属性的某些结论。大数据这个名称本身包含一个与规模相关的术语,这是大数据的一个重要特征。但抽样可以从更大的数据集中选择正确的数据点,以估计整个人口的特征。在制造过程中,不同类型的感官数据(如声学、振动、压力、电流、电压和控制器数据)在短时间间隔内可用。要预测停机时间,可能不需要查看所有数据,仅一个样本就足够了。大数据可以按不同的数据点分类,如人口统计、心理、行为和交易数据。有了大量的数据,营销人员可以创建和使用更多定制的消费者群体,以实现更具战略性的目标。 | | 关于大数据集的一个研究问题是,是否有必要查看完整的数据或者样本要足够好,以得出关于数据属性的某些结论。大数据这个名称本身包含一个与规模相关的术语,这是大数据的一个重要特征。但抽样可以从更大的数据集中选择正确的数据点,以估计整个人口的特征。在制造过程中,不同类型的感官数据(如声学、振动、压力、电流、电压和控制器数据)在短时间间隔内可用。要预测停机时间,可能不需要查看所有数据,仅一个样本就足够了。大数据可以按不同的数据点分类,如人口统计、心理、行为和交易数据。有了大量的数据,营销人员可以创建和使用更多定制的消费者群体,以实现更具战略性的目标。 |
| | | |
− | 在大数据采样算法方面已经有了一些成果。比如抽样 Twitter 数据的理论公式已被开发出。<ref>{{cite conference |author1=Deepan Palguna |author2= Vikas Joshi |author3=Venkatesan Chakravarthy |author4=Ravi Kothari |author5=L. V. Subramaniam |name-list-style=amp | title=Analysis of Sampling Algorithms for Twitter | journal=[[International Joint Conference on Artificial Intelligence]] | year=2015 }}</ref> | + | 在大数据采样算法方面已经有了一些成果。比如抽样 Twitter 数据的理论公式已被开发出。<ref>{{cite conference |author1=Deepan Palguna |author2= Vikas Joshi |author3=Venkatesan Chakravarthy |author4=Ravi Kothari |author5=L. V. Subramaniam | title=Analysis of Sampling Algorithms for Twitter | journal=[[International Joint Conference on Artificial Intelligence]] | year=2015 }}</ref> |
| | | |
| | | |
第390行: |
第390行: |
| | | |
| === 针对大数据执行的批评 === | | === 针对大数据执行的批评 === |
− | Ulf Dietrich Reips和Uwe Matzat在2014年写道,大数据已经成为科学研究的“风潮”。<ref name="pigdata" />研究人员Danah Boyd对大数据在科学中的使用提出了担忧,因为研究往往忽略了一些原则,比如选择代表性样本时过于关注处理大量数据,<ref name="danah">{{cite web | url=http://www.danah.org/papers/talks/2010/WWW2010.html | title=Privacy and Publicity in the Context of Big Data | author=danah boyd | work=[[World Wide Web Conference|WWW 2010 conference]] | date=29 April 2010 | access-date = 18 April 2011| author-link=danah boyd }}</ref>这种方法可能会导致结果在某种程度上存在偏差。<ref>{{Cite journal|last=Katyal|first=Sonia K.|date=2019|title=Artificial Intelligence, Advertising, and Disinformation|url=https://muse.jhu.edu/article/745987|journal=Advertising & Society Quarterly|language=en|volume=20|issue=4|doi=10.1353/asr.2019.0026|issn=2475-1790}}</ref>大量异构数据资源的集成(有些被认为是大数据,有些则不是)带来巨大的后勤和分析挑战,但许多研究人员认为,这种集成可能代表着科学领域最有前途的新前沿。<ref>{{cite journal |last1=Jones |first1=MB |last2=Schildhauer |first2=MP |last3=Reichman |first3=OJ |last4=Bowers | first4=S |title=The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere | journal=Annual Review of Ecology, Evolution, and Systematics |volume=37 |issue=1 |pages=519–544 |year=2006 |doi=10.1146/annurev.ecolsys.37.091305.110031 |url= http://www.pnamp.org/sites/default/files/Jones2006_AREES.pdf }}</ref>在这篇颇具煽动性的文章《大数据的关键问题》(Critical Questions for Big Data)中,<ref name="danah2">{{cite journal | doi = 10.1080/1369118X.2012.678878| title = Critical Questions for Big Data| journal = Information, Communication & Society| volume = 15| issue = 5| pages = 662–679| year = 2012| last1 = Boyd | first1 = D. | last2 = Crawford | first2 = K. | hdl = 10983/1320| hdl-access = free}}</ref>作者将大数据称为神话的一部分:“大数据集提供了更高形式的智能和知识……大数据的用户往往“迷失在庞大的数据量中”,而且“使用大数据仍然是主观的,它量化的东西不一定能够更接近客观事实”。<ref name="danah2" />BI领域的最新发展,例如前瞻性报告,特别是通过自动过滤无用数据及相关性来改善大数据的可用性。<ref name="Big Decisions White Paper">[http://www.fortewares.com/Administrator/userfiles/Banner/forte-wares--pro-active-reporting_EN.pdf Failure to Launch: From Big Data to Big Decisions] Forte Wares.</ref>大数据充满了虚假的相关性,<ref>{{Cite web | url=https://www.tylervigen.com/spurious-correlations | title=15 Insane Things That Correlate with Each Other}}</ref>要么是因为非因果巧合(真大数定律),要么是大随机数的唯一性<ref>[https://onlinelibrary.wiley.com/loi/10982418 Random structures & algorithms]</ref> (拉姆齐理论)或其他未发现的因素,因此早期实验者建立大型数字数据库“用数据说话”以及宣称的革新科学方法都受到了质疑。<ref>Cristian S. Calude, Giuseppe Longo, (2016), The Deluge of Spurious Correlations in Big Data, [[Foundations of Science]]</ref> | + | Ulf Dietrich Reips和Uwe Matzat在2014年写道,大数据已经成为科学研究的“风潮”。<ref name="pigdata" />研究人员Danah Boyd对大数据在科学中的使用提出了担忧,因为研究往往忽略了一些原则,比如选择代表性样本时过于关注处理大量数据,<ref name="danah">{{cite web | url=http://www.danah.org/papers/talks/2010/WWW2010.html | title=Privacy and Publicity in the Context of Big Data | author=danah boyd | work=[[World Wide Web Conference|WWW 2010 conference]] | date=29 April 2010 | access-date = 18 April 2011}}</ref>这种方法可能会导致结果在某种程度上存在偏差。<ref>{{Cite journal|last=Katyal|first=Sonia K.|date=2019|title=Artificial Intelligence, Advertising, and Disinformation|url=https://muse.jhu.edu/article/745987|journal=Advertising & Society Quarterly|language=en|volume=20|issue=4|doi=10.1353/asr.2019.0026|issn=2475-1790}}</ref>大量异构数据资源的集成(有些被认为是大数据,有些则不是)带来巨大的后勤和分析挑战,但许多研究人员认为,这种集成可能代表着科学领域最有前途的新前沿。<ref>{{cite journal |last1=Jones |first1=MB |last2=Schildhauer |first2=MP |last3=Reichman |first3=OJ |last4=Bowers | first4=S |title=The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere | journal=Annual Review of Ecology, Evolution, and Systematics |volume=37 |issue=1 |pages=519–544 |year=2006 |doi=10.1146/annurev.ecolsys.37.091305.110031 |url= http://www.pnamp.org/sites/default/files/Jones2006_AREES.pdf }}</ref>在这篇颇具煽动性的文章《大数据的关键问题》(Critical Questions for Big Data)中,<ref name="danah2">{{cite journal | doi = 10.1080/1369118X.2012.678878| title = Critical Questions for Big Data| journal = Information, Communication & Society| volume = 15| issue = 5| pages = 662–679| year = 2012| last1 = Boyd | first1 = D. | last2 = Crawford | first2 = K. | hdl = 10983/1320| hdl-access = free}}</ref>作者将大数据称为神话的一部分:“大数据集提供了更高形式的智能和知识……大数据的用户往往“迷失在庞大的数据量中”,而且“使用大数据仍然是主观的,它量化的东西不一定能够更接近客观事实”。<ref name="danah2" />BI领域的最新发展,例如前瞻性报告,特别是通过自动过滤无用数据及相关性来改善大数据的可用性。<ref name="Big Decisions White Paper">[http://www.fortewares.com/Administrator/userfiles/Banner/forte-wares--pro-active-reporting_EN.pdf Failure to Launch: From Big Data to Big Decisions] Forte Wares.</ref>大数据充满了虚假的相关性,<ref>{{Cite web | url=https://www.tylervigen.com/spurious-correlations | title=15 Insane Things That Correlate with Each Other}}</ref>要么是因为非因果巧合(真大数定律),要么是大随机数的唯一性<ref>[https://onlinelibrary.wiley.com/loi/10982418 Random structures & algorithms]</ref> (拉姆齐理论)或其他未发现的因素,因此早期实验者建立大型数字数据库“用数据说话”以及宣称的革新科学方法都受到了质疑。<ref>Cristian S. Calude, Giuseppe Longo, (2016), The Deluge of Spurious Correlations in Big Data, [[Foundations of Science]]</ref> |
| | | |
| | | |
− | 与对较小数据集的分析相比,大数据分析往往是肤浅的。<ref name="kdnuggets-berchthold">{{cite web|url=http://www.kdnuggets.com/2014/08/interview-michael-berthold-knime-research-big-data-privacy-part2.html|title=Interview: Michael Berthold, KNIME Founder, on Research, Creativity, Big Data, and Privacy, Part 2|date=12 August 2014|author=Gregory Piatetsky| author-link= Gregory I. Piatetsky-Shapiro|publisher=KDnuggets|access-date=13 August 2014}}</ref>在许多大数据项目中,没有进行大数据分析,但挑战在于提取、转换、加载和预处理数据的部分。<ref name="kdnuggets-berchthold" /> | + | 与对较小数据集的分析相比,大数据分析往往是肤浅的。<ref name="kdnuggets-berchthold">{{cite web|url=http://www.kdnuggets.com/2014/08/interview-michael-berthold-knime-research-big-data-privacy-part2.html|title=Interview: Michael Berthold, KNIME Founder, on Research, Creativity, Big Data, and Privacy, Part 2|date=12 August 2014|author=Gregory Piatetsky|publisher=KDnuggets|access-date=13 August 2014}}</ref>在许多大数据项目中,没有进行大数据分析,但挑战在于提取、转换、加载和预处理数据的部分。<ref name="kdnuggets-berchthold" /> |
| | | |
| | | |
− | 大数据是一个时髦的“模糊术语”,<ref>{{cite news|last1=Pelt|first1=Mason|title="Big Data" is an over used buzzword and this Twitter bot proves it|url= http://siliconangle.com/blog/2015/10/26/big-data-is-an-over-used-buzzword-and-this-twitter-bot-proves-it/ |newspaper=Siliconangle|access-date=4 November 2015|date=26 October 2015}}</ref><ref name="ft-harford">{{cite web |url=http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html |title=Big data: are we making a big mistake? |last1=Harford |first1=Tim |date=28 March 2014 |website=[[Financial Times]] |access-date=7 April 2014}}</ref>但同时也是企业家、咨询师、科学家和媒体的关注热点。<ref name="ft-harford" />近年来,谷歌流感趋势(Google Flu Trends)等大数据应用在最近几年未能提供好的预测,将流感疫情高估了两倍。类似地,基于Twitter的奥斯卡奖和选举预测往往偏离目标。大数据往往与小数据面临同样的挑战;添加更多数据并不能解决偏见问题,甚至可能会强调其他问题。尤其是Twitter等数据源不能代表整体人口的意见,从这些数据源得出的结果可能会导致错误结论。基于文本大数据统计分析的谷歌翻译在翻译网页方面做得很好。然而,来自专门领域的结果可能会有很大的偏差。另一方面,大数据也可能带来新的问题,比如多重比较问题:同时测试大量假设可能会产生许多错误的结果,这些结果被错误地认为是重要的。Ioannidis认为,“大多数已发表的研究结果都是错误的”,<ref name="Ioannidis">{{cite journal | vauthors = Ioannidis JP | title = Why most published research findings are false | journal = PLOS Medicine | volume = 2 | issue = 8 | pages = e124 | date = August 2005 | pmid = 16060722 | pmc = 1182327 | doi = 10.1371/journal.pmed.0020124 | author-link1 = John P. A. Ioannidis }}</ref>其原因基本上是相同的:当许多科学团队和研究人员各自进行许多实验(即处理大量科学数据;尽管没有使用大数据技术),一个“重大”结果被错误的可能性会迅速增加。而当只有正面的结果被公布时,这种可能性更大。 | + | 大数据是一个时髦的“模糊术语”,<ref>{{cite news|last1=Pelt|first1=Mason|title="Big Data" is an over used buzzword and this Twitter bot proves it|url= http://siliconangle.com/blog/2015/10/26/big-data-is-an-over-used-buzzword-and-this-twitter-bot-proves-it/ |newspaper=Siliconangle|access-date=4 November 2015|date=26 October 2015}}</ref><ref name="ft-harford">{{cite web |url=http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html |title=Big data: are we making a big mistake? |last1=Harford |first1=Tim |date=28 March 2014 |website=[[Financial Times]] |access-date=7 April 2014}}</ref>但同时也是企业家、咨询师、科学家和媒体的关注热点。<ref name="ft-harford" />近年来,谷歌流感趋势(Google Flu Trends)等大数据应用在最近几年未能提供好的预测,将流感疫情高估了两倍。类似地,基于Twitter的奥斯卡奖和选举预测往往偏离目标。大数据往往与小数据面临同样的挑战;添加更多数据并不能解决偏见问题,甚至可能会强调其他问题。尤其是Twitter等数据源不能代表整体人口的意见,从这些数据源得出的结果可能会导致错误结论。基于文本大数据统计分析的谷歌翻译在翻译网页方面做得很好。然而,来自专门领域的结果可能会有很大的偏差。另一方面,大数据也可能带来新的问题,比如多重比较问题:同时测试大量假设可能会产生许多错误的结果,这些结果被错误地认为是重要的。Ioannidis认为,“大多数已发表的研究结果都是错误的”,<ref name="Ioannidis">{{cite journal | vauthors = Ioannidis JP | title = Why most published research findings are false | journal = PLOS Medicine | volume = 2 | issue = 8 | pages = e124 | date = August 2005 | pmid = 16060722 | pmc = 1182327 | doi = 10.1371/journal.pmed.0020124 }}</ref>其原因基本上是相同的:当许多科学团队和研究人员各自进行许多实验(即处理大量科学数据;尽管没有使用大数据技术),一个“重大”结果被错误的可能性会迅速增加。而当只有正面的结果被公布时,这种可能性更大。 |
| | | |
| 此外,大数据分析结果只能与预测模型一样好。例如,大数据参与了对2016年美国总统选举结果的预测,<ref>{{Cite news|url=https://www.nytimes.com/2016/11/10/technology/the-data-said-clinton-would-win-why-you-shouldnt-have-believed-it.html|title=How Data Failed Us in Calling an Election |last1=Lohr|first1=Steve|date=10 November 2016|last2=Singer|first2=Natasha|newspaper=The New York Times|issn=0362-4331|access-date=27 November 2016}}</ref>并取得了与预测模型类似的不同程度的成功。 | | 此外,大数据分析结果只能与预测模型一样好。例如,大数据参与了对2016年美国总统选举结果的预测,<ref>{{Cite news|url=https://www.nytimes.com/2016/11/10/technology/the-data-said-clinton-would-win-why-you-shouldnt-have-believed-it.html|title=How Data Failed Us in Calling an Election |last1=Lohr|first1=Steve|date=10 November 2016|last2=Singer|first2=Natasha|newspaper=The New York Times|issn=0362-4331|access-date=27 November 2016}}</ref>并取得了与预测模型类似的不同程度的成功。 |
第455行: |
第455行: |
| ==拓展材料== | | ==拓展材料== |
| *{{cite magazine|editor1=Peter Kinnaird |editor2=Inbal Talgam-Cohen|magazine=[[XRDS (magazine)|XRDS: Crossroads, The ACM Magazine for Students]]|title=Big Data|volume=19 |issue=1|date=2012|publisher=[[Association for Computing Machinery]]|issn=1528-4980 |oclc=779657714 |url=http://dl.acm.org/citation.cfm?id=2331042}} | | *{{cite magazine|editor1=Peter Kinnaird |editor2=Inbal Talgam-Cohen|magazine=[[XRDS (magazine)|XRDS: Crossroads, The ACM Magazine for Students]]|title=Big Data|volume=19 |issue=1|date=2012|publisher=[[Association for Computing Machinery]]|issn=1528-4980 |oclc=779657714 |url=http://dl.acm.org/citation.cfm?id=2331042}} |
− | *{{cite book|title=Mining of massive datasets|author1=Jure Leskovec|author2-link=Anand Rajaraman|author2=Anand Rajaraman|author3-link=Jeffrey D. Ullman|author3=Jeffrey D. Ullman|year=2014|publisher=Cambridge University Press|url=http://mmds.org/|isbn=9781107077232 |oclc=888463433|author1-link=Jure Leskovec}} | + | *{{cite book|title=Mining of massive datasets|author1=Jure Leskovec|author2=Anand Rajaraman|author3=Jeffrey D. Ullman|year=2014|publisher=Cambridge University Press|url=http://mmds.org/|isbn=9781107077232 |oclc=888463433|author1-link=Jure Leskovec}} |
− | *{{cite book|author1=Viktor Mayer-Schönberger|author2-link=Kenneth Cukier|author2=Kenneth Cukier|title=Big Data: A Revolution that Will Transform how We Live, Work, and Think|date=2013|publisher=Houghton Mifflin Harcourt|isbn=9781299903029 |oclc=828620988|author1-link=Viktor Mayer-Schönberger}} | + | *{{cite book|author1=Viktor Mayer-Schönberger|author2=Kenneth Cukier|title=Big Data: A Revolution that Will Transform how We Live, Work, and Think|date=2013|publisher=Houghton Mifflin Harcourt|isbn=9781299903029 |oclc=828620988}} |
| *{{cite news |url=https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data |title=A Very Short History of Big Data |first=Gil |last=Press |work=forbes.com |date=9 May 2013 |access-date=17 September 2016 |location=Jersey City, NJ}} | | *{{cite news |url=https://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data |title=A Very Short History of Big Data |first=Gil |last=Press |work=forbes.com |date=9 May 2013 |access-date=17 September 2016 |location=Jersey City, NJ}} |
| *{{cite book |title=Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are |year=2017 |first=Seth |last=Stephens-Davidowitz |publisher=Dey Street Books |isbn=978-0062390851}} | | *{{cite book |title=Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are |year=2017 |first=Seth |last=Stephens-Davidowitz |publisher=Dey Street Books |isbn=978-0062390851}} |