更改

删除22,515字节 、 2022年3月6日 (日) 19:15
无编辑摘要
第308行: 第308行:     
===COVID-19===
 
===COVID-19===
 +
在2019冠状病毒疾病流行期间,大数据被认为是减少疾病影响的一种方法。大数据的重要应用包括最小化病毒传播、病例识别和医疗发展。<ref>{{cite journal |last1=Haleem |first1=Abid |last2=Javaid |first2=Mohd |last3=Khan |first3=Ibrahim |last4=Vaishya |first4=Raju |title=Significant Applications of Big Data in COVID-19 Pandemic |journal=Indian Journal of Orthopaedics |date=2020 |volume=54 |issue=4 |pages=526–528 |doi=10.1007/s43465-020-00129-z |pmid=32382166 |pmc=7204193 }}</ref>
   −
=== 新冠大流行 ===
  −
During the [[COVID-19 pandemic]], big data was raised as a way to minimise the impact of the disease. Significant applications of big data included minimising the spread of the virus, case identification and development of medical treatment.<ref>{{cite journal |last1=Haleem |first1=Abid |last2=Javaid |first2=Mohd |last3=Khan |first3=Ibrahim |last4=Vaishya |first4=Raju |title=Significant Applications of Big Data in COVID-19 Pandemic |journal=Indian Journal of Orthopaedics |date=2020 |volume=54 |issue=4 |pages=526–528 |doi=10.1007/s43465-020-00129-z |pmid=32382166 |pmc=7204193 }}</ref>
  −
  −
  −
'''''【终译版】'''''在2019冠状病毒疾病流行期间,大数据被认为是减少疾病影响的一种方法。大数据的重要应用包括最小化病毒传播、病例识别和医疗发展。
  −
  −
Governments used big data to track infected people to minimise spread. Early adopters included China, Taiwan, South Korea, and Israel.<ref>{{cite news |last1=Manancourt |first1=Vincent |title=Coronavirus tests Europe's resolve on privacy |url=https://www.politico.eu/article/coronavirus-tests-europe-resolve-on-privacy-tracking-apps-germany-italy/ |access-date=30 October 2020 |work=Politico |date=10 March 2020}}</ref><ref>{{cite news |last1=Choudhury |first1=Amit Roy |title=Gov in the Time of Corona |url=https://govinsider.asia/innovation/gov-in-the-time-of-corona/ |access-date=30 October 2020 |work=Gov Insider |date=27 March 2020}}</ref><ref>{{cite news |last1=Cellan-Jones |first1=Rory |title=China launches coronavirus 'close contact detector' app |url=https://www.bbc.com/news/technology-51439401 |access-date=30 October 2020 |work=BBC |date=11 February 2020|archive-url=https://web.archive.org/web/20200228003957/https://www.bbc.com/news/technology-51439401 |archive-date=28 February 2020 }}</ref>
  −
  −
  −
'''''【终译版】'''''各国政府使用大数据追踪感染者,以最大限度地减少传播。早期采用者包括中国、台湾、韩国和以色列。
      +
各国政府使用大数据追踪感染者,以最大限度地减少传播。早期采用者包括中国、台湾、韩国和以色列。<ref>{{cite news |last1=Manancourt |first1=Vincent |title=Coronavirus tests Europe's resolve on privacy |url=https://www.politico.eu/article/coronavirus-tests-europe-resolve-on-privacy-tracking-apps-germany-italy/ |access-date=30 October 2020 |work=Politico |date=10 March 2020}}</ref><ref>{{cite news |last1=Choudhury |first1=Amit Roy |title=Gov in the Time of Corona |url=https://govinsider.asia/innovation/gov-in-the-time-of-corona/ |access-date=30 October 2020 |work=Gov Insider |date=27 March 2020}}</ref><ref>{{cite news |last1=Cellan-Jones |first1=Rory |title=China launches coronavirus 'close contact detector' app |url=https://www.bbc.com/news/technology-51439401 |access-date=30 October 2020 |work=BBC |date=11 February 2020|archive-url=https://web.archive.org/web/20200228003957/https://www.bbc.com/news/technology-51439401 |archive-date=28 February 2020 }}</ref>
       
== 研究活动 ==
 
== 研究活动 ==
Encrypted search and cluster formation in big data were demonstrated in March 2014 at the American Society of Engineering Education. Gautam Siwach engaged at ''Tackling the challenges of Big Data'' by [[MIT Computer Science and Artificial Intelligence Laboratory]] and Amir Esmailpour at the UNH Research Group investigated the key features of big data as the formation of clusters and their interconnections. They focused on the security of big data and the orientation of the term towards the presence of different types of data in an encrypted form at cloud interface by providing the raw definitions and real-time examples within the technology. Moreover, they proposed an approach for identifying the encoding technique to advance towards an expedited search over encrypted text leading to the security enhancements in big data.<ref>{{cite conference |url=http://asee-ne.org/proceedings/2014/Student%20Papers/210.pdf |title=Encrypted Search & Cluster Formation in Big Data |last1=Siwach |first1=Gautam |last2=Esmailpour |first2=Amir |date=March 2014 |conference=ASEE 2014 Zone I Conference |conference-url=http://ubconferences.org/ |location=[[University of Bridgeport]], [[Bridgeport, Connecticut|Bridgeport]], Connecticut, US |access-date=26 July 2014 |archive-url=https://web.archive.org/web/20140809045242/http://asee-ne.org/proceedings/2014/Student%20Papers/210.pdf |archive-date=9 August 2014 |url-status=dead }}</ref>
+
2014年3月,美国工程教育学会(American Society of Engineering Education)展示了大数据中的加密搜索和集群形成。麻省理工学院计算机科学和人工智能实验室的Gautam Siwach和UNH研究小组的Amir Esmailpour致力于解决大数据的挑战,他们研究了大数据的关键特征,如集群的形成及其相互关联。他们通过提供技术中的原始定义和实时示例,重点关注大数据的安全性,以及该术语在云接口以加密形式存在不同类型数据的方向。此外,他们还提出了一种识别编码技术的方法,以加快对加密文本的搜索,从而增强大数据的安全性。<ref>{{cite conference |url=http://asee-ne.org/proceedings/2014/Student%20Papers/210.pdf |title=Encrypted Search & Cluster Formation in Big Data |last1=Siwach |first1=Gautam |last2=Esmailpour |first2=Amir |date=March 2014 |conference=ASEE 2014 Zone I Conference |conference-url=http://ubconferences.org/ |location=[[University of Bridgeport]], [[Bridgeport, Connecticut|Bridgeport]], Connecticut, US |access-date=26 July 2014 |archive-url=https://web.archive.org/web/20140809045242/http://asee-ne.org/proceedings/2014/Student%20Papers/210.pdf |archive-date=9 August 2014 |url-status=dead }}</ref>
 
  −
 
  −
'''''【终译版】'''''2014年3月,美国工程教育学会(American Society of Engineering Education)展示了大数据中的加密搜索和集群形成。麻省理工学院计算机科学和人工智能实验室的Gautam Siwach和UNH研究小组的Amir Esmailpour致力于解决大数据的挑战,他们研究了大数据的关键特征,如集群的形成及其相互关联。他们通过提供技术中的原始定义和实时示例,重点关注大数据的安全性,以及该术语在云接口以加密形式存在不同类型数据的方向。此外,他们还提出了一种识别编码技术的方法,以加快对加密文本的搜索,从而增强大数据的安全性。
  −
 
  −
In March 2012, The White House announced a national "Big Data Initiative" that consisted of six federal departments and agencies committing more than $200&nbsp;million to big data research projects.<ref>{{cite web|title=Obama Administration Unveils "Big Data" Initiative:Announces $200 Million in New R&D Investments| url=https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf |url-status =live| archive-url =https://web.archive.org/web/20170121233309/https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf |via=[[NARA|National Archives]]|work=[[Office of Science and Technology Policy]]|archive-date=21 January 2017}}</ref>
  −
 
  −
 
  −
'''''【终译版】'''''2012年3月,白宫宣布了一项国家“大数据倡议”,由六个联邦部门和机构组成,承诺向大数据研究项目投入2亿多美元。
  −
 
  −
The initiative included a National Science Foundation "Expeditions in Computing" grant of $10 million over five years to the AMPLab<ref>{{cite web|url=http://amplab.cs.berkeley.edu |title=AMPLab at the University of California, Berkeley |publisher=Amplab.cs.berkeley.edu |access-date=5 March 2013}}</ref> at the University of California, Berkeley.<ref>{{cite web |title=NSF Leads Federal Efforts in Big Data|date=29 March 2012|publisher=National Science Foundation (NSF) |url= https://www.nsf.gov/news/news_summ.jsp?cntn_id=123607&org=NSF&from=news}}</ref> The AMPLab also received funds from [[DARPA]], and over a dozen industrial sponsors and uses big data to attack a wide range of problems from predicting traffic congestion<ref>{{cite conference| url=https://amplab.cs.berkeley.edu/publication/scaling-the-mobile-millennium-system-in-the-cloud-2/|author1=Timothy Hunter|date=October 2011|author2=Teodor Moldovan|author3=Matei Zaharia| author4 =Justin Ma|author5=Michael Franklin|author6-link=Pieter Abbeel|author6=Pieter Abbeel|author7=Alexandre Bayen |title=Scaling the Mobile Millennium System in the Cloud}}</ref> to fighting cancer.<ref>{{cite news|title=Computer Scientists May Have What It Takes to Help Cure Cancer|author=David Patterson|work=The New York Times| date=5 December 2011 |url=https://www.nytimes.com/2011/12/06/science/david-patterson-enlist-computer-scientists-in-cancer-fight.html}}</ref>
  −
 
  −
 
  −
'''''【终译版】'''''该举措包括一个国家科学基金会“计算远征”,该项目将在五年内向加州大学伯克利分校的 AMPLab 提供1000万美元的资助。AMPLab还从DARPA和十几家行业赞助商那里获得资金,并利用大数据解决从预测交通拥堵到抗击癌症等一系列问题。
  −
 
  −
The White House Big Data Initiative also included a commitment by the Department of Energy to provide $25 million in funding over five years to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute,<ref>{{cite web|title=Secretary Chu Announces New Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers |publisher=energy.gov |url=http://energy.gov/articles/secretary-chu-announces-new-institute-help-scientists-improve-massive-data-set-research-doe}}</ref> led by the Energy Department's [[Lawrence Berkeley National Laboratory]]. The SDAV Institute aims to bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the department's supercomputers.
  −
 
  −
 
  −
 
  −
'''''【终译版】'''''白宫大数据倡议还包括能源部承诺在五年内提供2500万美元的资金,以建立由能源部劳伦斯·伯克利国家实验室领导的可扩展数据管理、分析和可视化(SDAV)研究所。SDAV研究所旨在汇集六个国家实验室和七所大学的专业知识,开发新的工具,帮助科学家管理和可视化国防部超级计算机上的数据。
  −
 
  −
The U.S. state of [[Massachusetts]] announced the Massachusetts Big Data Initiative in May 2012, which provides funding from the state government and private companies to a variety of research institutions.<ref>{{Cite news|last=Young|first=Shannon|date=2012-05-30|title=Mass. governor, MIT announce big data initiative|work=Boston.com|url=http://archive.boston.com/news/local/massachusetts/articles/2012/05/30/mass_gov_and_mit_to_announce_data_initiative/|access-date=2021-07-29}}</ref> The [[Massachusetts Institute of Technology]] hosts the Intel Science and Technology Center for Big Data in the [[MIT Computer Science and Artificial Intelligence Laboratory]], combining government, corporate, and institutional funding and research efforts.<ref>{{cite web|url=http://bigdata.csail.mit.edu/ |title=Big Data @ CSAIL |publisher= Bigdata.csail.mit.edu |date=22 February 2013 |access-date=5 March 2013}}</ref>
  −
 
  −
 
  −
'''''【终译版】'''''2012年5月,美国马萨诸塞州宣布了马萨诸塞州大数据计划,该计划由州政府和私营公司向各种研究机构提供资金。麻省理工学院(Massachusetts Institute of Technology)在麻省理工学院计算机科学与人工智能实验室(MIT Computer Science and Artificial Intelligence Laboratory)设立了英特尔大数据科学技术中心(Intel Science and Technology Center for Big Data),该中心将政府、企业和机构的资金和研究工作结合起来。
  −
 
  −
The European Commission is funding the two-year-long Big Data Public Private Forum through their Seventh Framework Program to engage companies, academics and other stakeholders in discussing big data issues. The project aims to define a strategy in terms of research and innovation to guide supporting actions from the European Commission in the successful implementation of the big data economy. Outcomes of this project will be used as input for [[Horizon 2020]], their next [[Framework Programmes for Research and Technological Development|framework program]].<ref>{{cite web |url=https://cordis.europa.eu/project/id/318062 |title=Big Data Public Private Forum |publisher=cordis.europa.eu |date=1 September 2012 |access-date=16 March 2020 }}</ref>
  −
 
  −
 
  −
 
  −
'''''【终译版】'''''欧盟委员会正在通过其第七个框架计划为为期两年的大数据公私论坛提供资金,让公司、学者和其他利益相关者参与讨论大数据问题。该项目旨在确定一项研究和创新战略,以指导欧盟委员会在成功实施大数据经济方面的支持行动。该项目的成果将作为其下一个框架项目“地平线2020”的投入。
  −
 
  −
The British government announced in March 2014 the founding of the [[Alan Turing Institute]], named after the computer pioneer and code-breaker, which will focus on new ways to collect and analyze large data sets.<ref>{{cite news|url=https://www.bbc.co.uk/news/technology-26651179|title=Alan Turing Institute to be set up to research big data|work=[[BBC News]]|access-date=19 March 2014|date=19 March 2014}}</ref>
     −
The British government announced in March 2014 the founding of the Alan Turing Institute, named after the computer pioneer and code-breaker, which will focus on new ways to collect and analyze large data sets.
     −
2014年3月,英国政府宣布成立艾伦图灵研究院数据中心,该中心以计算机先驱和密码破译者的名字命名,将致力于研究收集和分析大型数据集的新方法。
+
2012年3月,白宫宣布了一项国家“大数据倡议”,由六个联邦部门和机构组成,承诺向大数据研究项目投入2亿多美元。<ref>{{cite web|title=Obama Administration Unveils "Big Data" Initiative:Announces $200 Million in New R&D Investments| url=https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf |url-status =live| archive-url =https://web.archive.org/web/20170121233309/https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf |via=[[NARA|National Archives]]|work=[[Office of Science and Technology Policy]]|archive-date=21 January 2017}}</ref>
   −
'''''【终译版】'''''2014年3月,英国政府宣布成立艾伦·图灵研究所(Alan Turing Institute),该研究所以计算机先驱和破译者的名字命名,将专注于收集和分析大型数据集的新方法。
     −
At the [[University of Waterloo Stratford Campus]] Canadian Open Data Experience (CODE) Inspiration Day, participants demonstrated how using data visualization can increase the understanding and appeal of big data sets and communicate their story to the world.<ref>{{cite web|url= http://www.betakit.com/event/inspiration-day-at-university-of-waterloo-stratford-campus/| title=Inspiration day at University of Waterloo, Stratford Campus |publisher=betakit.com/ |access-date=28 February 2014}}</ref>
+
该举措包括一个国家科学基金会“计算远征”,该项目将在五年内向加州大学伯克利分校的 AMPLab 提供1000万美元的资助。<ref>{{cite web|url=http://amplab.cs.berkeley.edu |title=AMPLab at the University of California, Berkeley |publisher=Amplab.cs.berkeley.edu |access-date=5 March 2013}}</ref> at the University of California, Berkeley.<ref>{{cite web |title=NSF Leads Federal Efforts in Big Data|date=29 March 2012|publisher=National Science Foundation (NSF) |url= https://www.nsf.gov/news/news_summ.jsp?cntn_id=123607&org=NSF&from=news}}</ref>AMPLab还从DARPA和十几家行业赞助商那里获得资金,并利用大数据解决从预测交通拥堵<ref>{{cite conference| url=https://amplab.cs.berkeley.edu/publication/scaling-the-mobile-millennium-system-in-the-cloud-2/|author1=Timothy Hunter|date=October 2011|author2=Teodor Moldovan|author3=Matei Zaharia| author4 =Justin Ma|author5=Michael Franklin|author6-link=Pieter Abbeel|author6=Pieter Abbeel|author7=Alexandre Bayen |title=Scaling the Mobile Millennium System in the Cloud}}</ref>到抗击癌症等一系列问题。<ref>{{cite news|title=Computer Scientists May Have What It Takes to Help Cure Cancer|author=David Patterson|work=The New York Times| date=5 December 2011 |url=https://www.nytimes.com/2011/12/06/science/david-patterson-enlist-computer-scientists-in-cancer-fight.html}}</ref>
       +
白宫大数据倡议还包括能源部承诺在五年内提供2500万美元的资金,以建立由能源部劳伦斯·伯克利国家实验室领导的可扩展数据管理、分析和可视化(SDAV)研究所。<ref>{{cite web|title=Secretary Chu Announces New Institute to Help Scientists Improve Massive Data Set Research on DOE Supercomputers |publisher=energy.gov |url=http://energy.gov/articles/secretary-chu-announces-new-institute-help-scientists-improve-massive-data-set-research-doe}}</ref>SDAV研究所旨在汇集六个国家实验室和七所大学的专业知识,开发新的工具,帮助科学家管理和可视化国防部超级计算机上的数据。
   −
'''''【终译版】'''''在滑铁卢大学斯特拉特福校园加拿大开放数据体验(CODE)启示日,与会者演示了如何使用数据可视化可以增加对大数据集的理解和吸引力,并向世界传达他们的故事。
     −
[[Computational social science|Computational social sciences]]&nbsp;– Anyone can use application programming interfaces (APIs) provided by big data holders, such as Google and Twitter, to do research in the social and behavioral sciences.<ref name=pigdata>{{cite journal|last=Reips|first=Ulf-Dietrich|author2=Matzat, Uwe |title=Mining "Big Data" using Big Data Services |journal=International Journal of Internet Science |year=2014|volume=1|issue=1|pages=1–8 | url=http://www.ijis.net/ijis9_1/ijis9_1_editorial_pre.html}}</ref> Often these APIs are provided for free.<ref name="pigdata" /> [[Tobias Preis]] et al. used [[Google Trends]] data to demonstrate that Internet users from countries with a higher per capita gross domestic products (GDPs) are more likely to search for information about the future than information about the past. The findings suggest there may be a link between online behaviors and real-world economic indicators.<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE, Bishop SR | title = Quantifying the advantage of looking forward | journal = Scientific Reports | volume = 2 | pages = 350 | year = 2012 | pmid = 22482034 | pmc = 3320057 | doi = 10.1038/srep00350 | bibcode = 2012NatSR...2E.350P }}</ref><ref>{{cite news | url=https://www.newscientist.com/article/dn21678-online-searches-for-future-linked-to-economic-success.html | title=Online searches for future linked to economic success |first=Paul |last=Marks |work=New Scientist | date=5 April 2012 | access-date=9 April 2012}}</ref><ref>{{cite news | url=https://arstechnica.com/gadgets/news/2012/04/google-trends-reveals-clues-about-the-mentality-of-richer-nations.ars | title=Google Trends reveals clues about the mentality of richer nations |first=Casey |last=Johnston |work=Ars Technica | date=6 April 2012 | access-date=9 April 2012}}</ref> The authors of the study examined Google queries logs made by ratio of the volume of searches for the coming year (2011) to the volume of searches for the previous year (2009), which they call the "[[future orientation index]]".<ref>{{cite web | url = http://www.tobiaspreis.de/bigdata/future_orientation_index.pdf | title = Supplementary Information: The Future Orientation Index is available for download | author = Tobias Preis | date = 24 May 2012 | access-date = 24 May 2012}}</ref> They compared the future orientation index to the per capita GDP of each country, and found a strong tendency for countries where Google users inquire more about the future to have a higher GDP.
+
2012年5月,美国马萨诸塞州宣布了马萨诸塞州大数据计划,该计划由州政府和私营公司向各种研究机构提供资金。<ref>{{Cite news|last=Young|first=Shannon|date=2012-05-30|title=Mass. governor, MIT announce big data initiative|work=Boston.com|url=http://archive.boston.com/news/local/massachusetts/articles/2012/05/30/mass_gov_and_mit_to_announce_data_initiative/|access-date=2021-07-29}}</ref>麻省理工学院(Massachusetts Institute of Technology)在麻省理工学院计算机科学与人工智能实验室(MIT Computer Science and Artificial Intelligence Laboratory)设立了英特尔大数据科学技术中心(Intel Science and Technology Center for Big Data),该中心将政府、企业和机构的资金和研究工作结合起来。<ref>{{cite web|url=http://bigdata.csail.mit.edu/ |title=Big Data @ CSAIL |publisher= Bigdata.csail.mit.edu |date=22 February 2013 |access-date=5 March 2013}}</ref>
       +
欧盟委员会正在通过其第七个框架计划为为期两年的大数据公私论坛提供资金,让公司、学者和其他利益相关者参与讨论大数据问题。该项目旨在确定一项研究和创新战略,以指导欧盟委员会在成功实施大数据经济方面的支持行动。该项目的成果将作为其下一个框架项目“地平线2020”的投入。<ref>{{cite web |url=https://cordis.europa.eu/project/id/318062 |title=Big Data Public Private Forum |publisher=cordis.europa.eu |date=1 September 2012 |access-date=16 March 2020 }}</ref>
   −
'''''【终译版】'''''计算社会科学——任何人都可以使用谷歌和Twitter等大数据持有者提供的应用程序编程接口(API)进行社会和行为科学研究。这些API通常是免费提供的。Tobias Preis等人利用谷歌趋势数据证明,来自人均国内生产总值(GDP)较高国家的互联网用户搜索未来信息的可能性大于搜索过去信息的可能性。研究结果表明,在线行为与现实世界的经济指标之间可能存在联系。这项研究的作者根据下一年(2011年)的搜索量与上一年(2009年)的搜索量之比来检查谷歌的查询日志,他们称之为“未来方向指数”。他们将未来导向指数与每个国家的人均GDP进行了比较,发现谷歌用户查询更多关于未来的国家有更高GDP的强烈趋势。
     −
[[Tobias Preis]] and his colleagues Helen Susannah Moat and [[H. Eugene Stanley]] introduced a method to identify online precursors for stock market moves, using trading strategies based on search volume data provided by Google Trends.<ref>{{cite journal | url =http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball | journal=Nature | date=26 April 2013 | doi=10.1038/nature.2013.12879 | s2cid=167357427 | access-date=9 August 2013| author-link=Philip Ball }}</ref> Their analysis of [[Google]] search volume for 98 terms of varying financial relevance, published in ''[[Scientific Reports]]'',<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE | title = Quantifying trading behavior in financial markets using Google Trends | journal = Scientific Reports | volume = 3 | pages = 1684 | year = 2013 | pmid = 23619126 | pmc = 3635219 | doi = 10.1038/srep01684 | bibcode = 2013NatSR...3E1684P }}</ref> suggests that increases in search volume for financially relevant search terms tend to precede large losses in financial markets.<ref>{{cite news | url=http://bits.blogs.nytimes.com/2013/04/26/google-search-terms-can-predict-stock-market-study-finds/ | title= Google Search Terms Can Predict Stock Market, Study Finds | author=Nick Bilton | work=[[The New York Times]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite magazine | url=http://business.time.com/2013/04/26/trouble-with-your-investment-portfolio-google-it/ | title=Trouble With Your Investment Portfolio? Google It! | author=Christopher Matthews | magazine=[[Time (magazine)|Time]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite journal | url= http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball |journal=[[Nature (journal)|Nature]] | date=26 April 2013 | doi=10.1038/nature.2013.12879 | s2cid=167357427 | access-date=9 August 2013}}</ref><ref>{{cite news | url=http://www.businessweek.com/articles/2013-04-25/big-data-researchers-turn-to-google-to-beat-the-markets | title='Big Data' Researchers Turn to Google to Beat the Markets | author=Bernhard Warner | work=[[Bloomberg Businessweek]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url=https://www.independent.co.uk/news/business/comment/hamish-mcrae/hamish-mcrae-need-a-valuable-handle-on-investor-sentiment-google-it-8590991.html | title=Hamish McRae: Need a valuable handle on investor sentiment? Google it | author=Hamish McRae | work=[[The Independent]] | date=28 April 2013 | access-date=9 August 2013 | location=London}}</ref><ref>{{cite web | url=http://www.ft.com/intl/cms/s/0/e5d959b8-acf2-11e2-b27f-00144feabdc0.html | title= Google search proves to be new word in stock market prediction | author=Richard Waters | work=[[Financial Times]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url =https://www.bbc.co.uk/news/science-environment-22293693 | title=Google searches predict market moves | author=Jason Palmer | work=[[BBC]] | date=25 April 2013 | access-date=9 August 2013}}</ref>
+
2014年3月,英国政府宣布成立艾伦·图灵研究所(Alan Turing Institute),该研究所以计算机先驱和破译者的名字命名,将专注于收集和分析大型数据集的新方法。<ref>{{cite news|url=https://www.bbc.co.uk/news/technology-26651179|title=Alan Turing Institute to be set up to research big data|work=[[BBC News]]|access-date=19 March 2014|date=19 March 2014}}</ref>
       +
在滑铁卢大学斯特拉特福校园加拿大开放数据体验(CODE)启示日,与会者演示了如何使用数据可视化可以增加对大数据集的理解和吸引力,并向世界传达他们的故事。<ref>{{cite web|url= http://www.betakit.com/event/inspiration-day-at-university-of-waterloo-stratford-campus/| title=Inspiration day at University of Waterloo, Stratford Campus |publisher=betakit.com/ |access-date=28 February 2014}}</ref>
   −
'''''【终译版】'''''Tobias Preis和他的同事Helen Susannah Moat和H.Eugene Stanley介绍了一种方法,使用基于谷歌趋势(Google Trends)提供的搜索量数据的交易策略,识别股市走势的在线前兆。他们在科学报告中对谷歌98个不同财务相关性的搜索量进行的分析表明,财务相关搜索量的增加往往先于金融市场的巨大损失。
     −
Big data sets come with algorithmic challenges that previously did not exist. Hence, there is seen by some to be a need to fundamentally change the processing ways.<ref>E. Sejdić (March 2014). "Adapt current tools for use with big data". ''Nature''. '''507''' (7492): 306.</ref>
+
计算社会科学——任何人都可以使用谷歌和Twitter等大数据持有者提供的应用程序编程接口(API)进行社会和行为科学研究。<ref name=pigdata>{{cite journal|last=Reips|first=Ulf-Dietrich|author2=Matzat, Uwe |title=Mining "Big Data" using Big Data Services |journal=International Journal of Internet Science |year=2014|volume=1|issue=1|pages=1–8 | url=http://www.ijis.net/ijis9_1/ijis9_1_editorial_pre.html}}</ref>这些API通常是免费提供的。<ref name="pigdata" />Tobias Preis等人利用谷歌趋势数据证明,来自人均国内生产总值(GDP)较高国家的互联网用户搜索未来信息的可能性大于搜索过去信息的可能性。研究结果表明,在线行为与现实世界的经济指标之间可能存在联系。<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE, Bishop SR | title = Quantifying the advantage of looking forward | journal = Scientific Reports | volume = 2 | pages = 350 | year = 2012 | pmid = 22482034 | pmc = 3320057 | doi = 10.1038/srep00350 | bibcode = 2012NatSR...2E.350P }}</ref><ref>{{cite news | url=https://www.newscientist.com/article/dn21678-online-searches-for-future-linked-to-economic-success.html | title=Online searches for future linked to economic success |first=Paul |last=Marks |work=New Scientist | date=5 April 2012 | access-date=9 April 2012}}</ref><ref>{{cite news | url=https://arstechnica.com/gadgets/news/2012/04/google-trends-reveals-clues-about-the-mentality-of-richer-nations.ars | title=Google Trends reveals clues about the mentality of richer nations |first=Casey |last=Johnston |work=Ars Technica | date=6 April 2012 | access-date=9 April 2012}}</ref>这项研究的作者根据下一年(2011年)的搜索量与上一年(2009年)的搜索量之比来检查谷歌的查询日志,他们称之为“未来方向指数”。<ref>{{cite web | url = http://www.tobiaspreis.de/bigdata/future_orientation_index.pdf | title = Supplementary Information: The Future Orientation Index is available for download | author = Tobias Preis | date = 24 May 2012 | access-date = 24 May 2012}}</ref>他们将未来导向指数与每个国家的人均GDP进行了比较,发现谷歌用户查询更多关于未来的国家有更高GDP的强烈趋势。
   −
Big data sets come with algorithmic challenges that previously did not exist. Hence, there is seen by some to be a need to fundamentally change the processing ways.E. Sejdić (March 2014). "Adapt current tools for use with big data". Nature. 507 (7492): 306.
     −
大数据集带来了以前不存在的算法挑战。因此,有些人认为有必要从根本上改变处理方式。Sejdi (2014年3月)。“调整现有工具,以便与大数据一起使用”。自然。507 (7492): 306.
+
Tobias Preis和他的同事Helen Susannah Moat和H.Eugene Stanley介绍了一种方法,使用基于谷歌趋势(Google Trends)提供的搜索量数据的交易策略,识别股市走势的在线前兆。他们在科学报告中对谷歌98个不同财务相关性的搜索量进行的分析表明,财务相关搜索量的增加往往先于金融市场的巨大损失。<ref>{{cite journal | url =http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball | journal=Nature | date=26 April 2013 | doi=10.1038/nature.2013.12879 | s2cid=167357427 | access-date=9 August 2013| author-link=Philip Ball }}</ref> Their analysis of [[Google]] search volume for 98 terms of varying financial relevance, published in ''[[Scientific Reports]]'',<ref>{{cite journal | vauthors = Preis T, Moat HS, Stanley HE | title = Quantifying trading behavior in financial markets using Google Trends | journal = Scientific Reports | volume = 3 | pages = 1684 | year = 2013 | pmid = 23619126 | pmc = 3635219 | doi = 10.1038/srep01684 | bibcode = 2013NatSR...3E1684P }}</ref> suggests that increases in search volume for financially relevant search terms tend to precede large losses in financial markets.<ref>{{cite news | url=http://bits.blogs.nytimes.com/2013/04/26/google-search-terms-can-predict-stock-market-study-finds/ | title= Google Search Terms Can Predict Stock Market, Study Finds | author=Nick Bilton | work=[[The New York Times]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite magazine | url=http://business.time.com/2013/04/26/trouble-with-your-investment-portfolio-google-it/ | title=Trouble With Your Investment Portfolio? Google It! | author=Christopher Matthews | magazine=[[Time (magazine)|Time]] | date=26 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite journal | url= http://www.nature.com/news/counting-google-searches-predicts-market-movements-1.12879 | title=Counting Google searches predicts market movements | author=Philip Ball |journal=[[Nature (journal)|Nature]] | date=26 April 2013 | doi=10.1038/nature.2013.12879 | s2cid=167357427 | access-date=9 August 2013}}</ref><ref>{{cite news | url=http://www.businessweek.com/articles/2013-04-25/big-data-researchers-turn-to-google-to-beat-the-markets | title='Big Data' Researchers Turn to Google to Beat the Markets | author=Bernhard Warner | work=[[Bloomberg Businessweek]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url=https://www.independent.co.uk/news/business/comment/hamish-mcrae/hamish-mcrae-need-a-valuable-handle-on-investor-sentiment-google-it-8590991.html | title=Hamish McRae: Need a valuable handle on investor sentiment? Google it | author=Hamish McRae | work=[[The Independent]] | date=28 April 2013 | access-date=9 August 2013 | location=London}}</ref><ref>{{cite web | url=http://www.ft.com/intl/cms/s/0/e5d959b8-acf2-11e2-b27f-00144feabdc0.html | title= Google search proves to be new word in stock market prediction | author=Richard Waters | work=[[Financial Times]] | date=25 April 2013 | access-date=9 August 2013}}</ref><ref>{{cite news | url =https://www.bbc.co.uk/news/science-environment-22293693 | title=Google searches predict market moves | author=Jason Palmer | work=[[BBC]] | date=25 April 2013 | access-date=9 August 2013}}</ref>
   −
'''''【终译版】'''''大数据集带来了以前不存在的算法挑战。因此,一些人认为有必要从根本上改变处理方式。
     −
The Workshops on Algorithms for Modern Massive Data Sets (MMDS) bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to discuss algorithmic challenges of big data.<ref>Stanford. [https://web.stanford.edu/group/mmds/ "MMDS. Workshop on Algorithms for Modern Massive Data Sets"].</ref> Regarding big data, such concepts of magnitude are relative. As it is stated "If the past is of any guidance, then today's big data most likely will not be considered as such in the near future."<ref name=CAD7challenges/>
+
大数据集带来了以前不存在的算法挑战。因此,一些人认为有必要从根本上改变处理方式。<ref>E. Sejdić (March 2014). "Adapt current tools for use with big data". ''Nature''. '''507''' (7492): 306.</ref>
       +
现代海量数据集(MMD)算法研讨会汇集了计算机科学家、统计学家、数学家和数据分析从业者,讨论大数据的算法挑战。<ref>Stanford. [https://web.stanford.edu/group/mmds/ "MMDS. Workshop on Algorithms for Modern Massive Data Sets"].</ref>关于大数据,这样的量级概念是相对的。正如它所说,“如果说过去的数据有什么指导意义的话,那么今天的大数据在不久的将来很可能不会被认为是这样的。”<ref name=CAD7challenges/>
   −
'''''【终译版】'''''现代海量数据集(MMD)算法研讨会汇集了计算机科学家、统计学家、数学家和数据分析从业者,讨论大数据的算法挑战。关于大数据,这样的量级概念是相对的。正如它所说,“如果说过去的数据有什么指导意义的话,那么今天的大数据在不久的将来很可能不会被认为是这样的。”
     −
===Sampling big data===
      
=== 大数据采样 ===
 
=== 大数据采样 ===
A research question that is asked about big data sets is whether it is necessary to look at the full data to draw certain conclusions about the properties of the data or if is a sample is good enough. The name big data itself contains a term related to size and this is an important characteristic of big data. But [[Sampling (statistics)|sampling]] enables the selection of right data points from within the larger data set to estimate the characteristics of the whole population. In manufacturing different types of sensory data such as acoustics, vibration, pressure, current, voltage, and controller data are available at short time intervals. To predict downtime it may not be necessary to look at all the data but a sample may be sufficient.  Big data can be broken down by various data point categories such as demographic, psychographic, behavioral, and transactional data.  With large sets of data points, marketers are able to create and use more customized segments of consumers for more strategic targeting.
+
关于大数据集的一个研究问题是,是否有必要查看完整的数据或者样本要足够好,以得出关于数据属性的某些结论。大数据这个名称本身包含一个与规模相关的术语,这是大数据的一个重要特征。但抽样可以从更大的数据集中选择正确的数据点,以估计整个人口的特征。在制造过程中,不同类型的感官数据(如声学、振动、压力、电流、电压和控制器数据)在短时间间隔内可用。要预测停机时间,可能不需要查看所有数据,仅一个样本就足够了。大数据可以按不同的数据点分类,如人口统计、心理、行为和交易数据。有了大量的数据,营销人员可以创建和使用更多定制的消费者群体,以实现更具战略性的目标。
 
  −
 
  −
 
  −
There has been some work done in sampling algorithms for big data. A theoretical formulation for sampling Twitter data has been developed.<ref>{{cite conference |author1=Deepan Palguna |author2= Vikas Joshi |author3=Venkatesan Chakravarthy |author4=Ravi Kothari |author5=L. V. Subramaniam |name-list-style=amp | title=Analysis of Sampling Algorithms for Twitter | journal=[[International Joint Conference on Artificial Intelligence]] | year=2015 }}</ref>
  −
 
  −
There has been some work done in sampling algorithms for big data. A theoretical formulation for sampling Twitter data has been developed.
  −
 
  −
在大数据的抽样算法方面已经做了一些工作。已经开发了一个抽样 Twitter 数据的理论公式。
  −
 
  −
'''''【终译版】'''''关于大数据集的一个研究问题是,是否有必要查看完整的数据或者样本要足够好,以得出关于数据属性的某些结论。大数据这个名称本身包含一个与规模相关的术语,这是大数据的一个重要特征。但抽样可以从更大的数据集中选择正确的数据点,以估计整个人口的特征。在制造过程中,不同类型的感官数据(如声学、振动、压力、电流、电压和控制器数据)在短时间间隔内可用。要预测停机时间,可能不需要查看所有数据,仅一个样本就足够了。大数据可以按不同的数据点分类,如人口统计、心理、行为和交易数据。有了大量的数据,营销人员可以创建和使用更多定制的消费者群体,以实现更具战略性的目标。
     −
在大数据采样算法方面已经有了一些成果。比如抽样 Twitter 数据的理论公式已被开发出。
+
在大数据采样算法方面已经有了一些成果。比如抽样 Twitter 数据的理论公式已被开发出。<ref>{{cite conference |author1=Deepan Palguna |author2= Vikas Joshi |author3=Venkatesan Chakravarthy |author4=Ravi Kothari |author5=L. V. Subramaniam |name-list-style=amp | title=Analysis of Sampling Algorithms for Twitter | journal=[[International Joint Conference on Artificial Intelligence]] | year=2015 }}</ref>
       
== 批评 ==
 
== 批评 ==
Critiques of the big data paradigm come in two flavors: those that question the implications of the approach itself, and those that question the way it is currently done.<ref name="Kimble and Milolidakis (2015)">{{Cite Q|Q56532925}}</ref> One approach to this criticism is the field of [[critical data studies]].
+
针对大数据范式的批评有两种:一种是质疑方法本身,另一种是质疑目前的方法。对这种批评的一个形式是批判性数据研究领域。<ref name="Kimble and Milolidakis (2015)" />
 
  −
 
  −
'''''【终译版】'''''针对大数据范式的批评有两种:一种是质疑方法本身,另一种是质疑目前的方法。对这种批评的一个形式是批判性数据研究领域。
            
=== 针对大数据范式的批评 ===
 
=== 针对大数据范式的批评 ===
"A crucial problem is that we do not know much about the underlying empirical micro-processes that lead to the emergence of the[se] typical network characteristics of Big Data."<ref name="Editorial" /> In their critique, Snijders, Matzat, and [[Ulf-Dietrich Reips|Reips]] point out that often very strong assumptions are made about mathematical properties that may not at all reflect what is really going on at the level of micro-processes. Mark Graham has leveled broad critiques at [[Chris Anderson (writer)|Chris Anderson]]'s assertion that big data will spell the end of theory:<ref>{{Cite magazine|url=https://www.wired.com/science/discoveries/magazine/16-07/pb_theory|title=The End of Theory: The Data Deluge Makes the Scientific Method Obsolete|author=Chris Anderson|date=23 June 2008|magazine=Wired}}</ref> focusing in particular on the notion that big data must always be contextualized in their social, economic, and political contexts.<ref>{{cite news |author=Graham M. |title=Big data and the end of theory? |newspaper=The Guardian |url= https://www.theguardian.com/news/datablog/2012/mar/09/big-data-theory |location=London |date=9 March 2012}}</ref> Even as companies invest eight- and nine-figure sums to derive insight from information streaming in from suppliers and customers, less than 40% of employees have sufficiently mature processes and skills to do so. To overcome this insight deficit, big data, no matter how comprehensive or well analyzed, must be complemented by "big judgment", according to an article in the ''[[Harvard Business Review]]''.<ref>{{cite journal|title=Good Data Won't Guarantee Good Decisions |journal=[[Harvard Business Review]]|url=http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1|author=Shah, Shvetank|author2=Horne, Andrew|author3=Capellá, Jaime |access-date=8 September 2012|date=April 2012}}</ref>
+
“一个关键问题是,我们对导致大数据典型网络特征出现的潜在经验微观过程知之甚少。”<ref name="Editorial" /> Snijders、Matzat和Reips在他们的评论中指出,通常对数学性质做出非常强烈的假设,这些假设可能根本无法反映微观过程上的真实情况。马克·格雷厄姆 Mark Graham批评了克里斯·安德森Chris Anderson关于大数据将意味着理论的终结的断言:<ref>{{Cite magazine|url=https://www.wired.com/science/discoveries/magazine/16-07/pb_theory|title=The End of Theory: The Data Deluge Makes the Scientific Method Obsolete|author=Chris Anderson|date=23 June 2008|magazine=Wired}}</ref>他特别关注大数据必须始终在其社会、经济和政治背景下进行语境化的概念。<ref>{{cite news |author=Graham M. |title=Big data and the end of theory? |newspaper=The Guardian |url= https://www.theguardian.com/news/datablog/2012/mar/09/big-data-theory |location=London |date=9 March 2012}}</ref>尽管公司投入8位数甚至9位数的资金,从供应商和客户的信息流中获取洞察力,但只有不到40%的员工拥有足够成熟的技能。根据《哈佛商业评论》(Harvard Business Review)上的一篇文章,为了克服这种洞察力缺陷,大数据无论多么全面或分析得多么好,都必须辅之以“综合判断力”。<ref>{{cite journal|title=Good Data Won't Guarantee Good Decisions |journal=[[Harvard Business Review]]|url=http://hbr.org/2012/04/good-data-wont-guarantee-good-decisions/ar/1|author=Shah, Shvetank|author2=Horne, Andrew|author3=Capellá, Jaime |access-date=8 September 2012|date=April 2012}}</ref>
 
  −
 
  −
 
  −
'''''【终译版】'''''“一个关键问题是,我们对导致大数据典型网络特征出现的潜在经验微观过程知之甚少。”Snijders、Matzat和Reips在他们的评论中指出,通常对数学性质做出非常强烈的假设,这些假设可能根本无法反映微观过程上的真实情况。马克·格雷厄姆(Mark Graham)批评了克里斯·安德森(Chris Anderson)关于大数据将意味着理论的终结的断言:他特别关注大数据必须始终在其社会、经济和政治背景下进行语境化的概念。尽管公司投入8位数甚至9位数的资金,从供应商和客户的信息流中获取洞察力,但只有不到40%的员工拥有足够成熟的技能。根据《哈佛商业评论》(Harvard Business Review)上的一篇文章,为了克服这种洞察力缺陷,大数据无论多么全面或分析得多么好,都必须辅之以“综合判断力”。
  −
 
  −
Much in the same line, it has been pointed out that the decisions based on the analysis of big data are inevitably "informed by the world as it was in the past, or, at best, as it currently is".<ref name="HilbertBigData2013">Hilbert, M. (2016). Big Data for Development: A Review of Promises and Challenges. Development Policy Review, 34(1), 135–174. https://doi.org/10.1111/dpr.12142 free access: https://www.martinhilbert.net/big-data-for-development/</ref> Fed by a large number of data on past experiences, algorithms can predict future development if the future is similar to the past.<ref name="HilbertTEDx">[https://www.youtube.com/watch?v=UXef6yfJZAI Big Data requires Big Visions for Big Change.], Hilbert, M. (2014). London: TEDx UCL, x=independently organized TED talks</ref> If the system's dynamics of the future change (if it is not a [[stationary process]]), the past can say little about the future. In order to make predictions in changing environments, it would be necessary to have a thorough understanding of the systems dynamic, which requires theory.<ref name="HilbertTEDx"/> As a response to this critique Alemany Oliver and Vayre suggest to use "abductive reasoning as a first step in the research process in order to bring context to consumers' digital traces and make new theories emerge".<ref>{{cite journal|last=Alemany Oliver|first=Mathieu |author2=Vayre, Jean-Sebastien |s2cid=111360835 |title= Big Data and the Future of Knowledge Production in Marketing Research: Ethics, Digital Traces, and Abductive Reasoning|journal=Journal of Marketing Analytics |year=2015|volume=3|issue=1|doi= 10.1057/jma.2015.1|pages=5–13}}</ref>
  −
Additionally, it has been suggested to combine big data approaches with computer simulations, such as [[agent-based model]]s<ref name="HilbertBigData2013" /> and [[complex systems]]. Agent-based models are increasingly getting better in predicting the outcome of social complexities of even unknown future scenarios through computer simulations that are based on a collection of mutually interdependent algorithms.<ref>{{cite web|url= https://www.theatlantic.com/magazine/archive/2002/04/seeing-around-corners/302471/| title=Seeing Around Corners|author=Jonathan Rauch|date=1 April 2002|work=[[The Atlantic]]}}</ref><ref>Epstein, J. M., & Axtell, R. L. (1996). Growing Artificial Societies: Social Science from the Bottom Up. A Bradford Book.</ref> Finally, the use of multivariate methods that probe for the latent structure of the data, such as [[factor analysis]] and [[cluster analysis]], have proven useful as analytic approaches that go well beyond the bi-variate approaches (e.g. [[Contingency table|contingency tables]]) typically employed with smaller data sets.
  −
 
  −
 
  −
 
  −
'''''【终译版】'''''与此大致相同的是,有人指出,基于大数据分析的决策不可避免地“像过去一样,或者充其量也像现在一样,受到世界的影响”。如果未来与过去相似,通过大量关于过去经验的数据,算法可以预测未来的发展。如果系统对未来的动态变化(如果它不是一个平稳的过程),那么过去对未来的影响就很小。为了在不断变化的环境中做出预测,有必要对系统动力学有一个透彻的了解。作为对这一批评的回应,Alemany Oliver和Vayre建议使用“诱因推理作为研究过程的第一步,以便为消费者的数字痕迹提供背景,并使新的理论出现”。此外,有人建议将大数据方法与计算机模拟相结合,例如基于代理的模型和复杂系统。通过基于一系列相互依赖的算法的计算机模拟,基于代理的模型在预测甚至未知场景的社会复杂性的结果方面越来越好。最后,探索数据潜在结构的多变量方法的使用,如因子分析和聚类分析,已被证明是有用的分析方法,远远超出了通常用于较小数据集的双变量方法。
  −
 
  −
In health and biology, conventional scientific approaches are based on experimentation. For these approaches, the limiting factor is the relevant data that can confirm or refute the initial hypothesis.<ref>{{cite web|url=http://www.bigdataparis.com/documents/Pierre-Delort-INSERM.pdf#page=5| title=Delort P., Big data in Biosciences, Big Data Paris, 2012|website =Bigdataparis.com |access-date=8 October 2017}}</ref>
  −
A new postulate is accepted now in biosciences: the information provided by the data in huge volumes ([[omics]]) without prior hypothesis is complementary and sometimes necessary to conventional approaches based on experimentation.<ref>{{cite web|url=https://www.cs.cmu.edu/~durand/03-711/2011/Literature/Next-Gen-Genomics-NRG-2010.pdf|title=Next-generation genomics: an integrative approach|date=July 2010|publisher=nature|access-date=18 October 2016}}</ref><ref>{{cite web|url= https://www.researchgate.net/publication/283298499|title=Big Data in Biosciences| date=October 2015|access-date=18 October 2016}}</ref> In the massive approaches it is the formulation of a relevant hypothesis to explain the data that is the limiting factor.<ref>{{cite news|url=https://next.ft.com/content/21a6e7d8-b479-11e3-a09a-00144feabdc0|title=Big data: are we making a big mistake?|date=28 March 2014|work=Financial Times|access-date=20 October 2016}}</ref> The search logic is reversed and the limits of induction ("Glory of Science and Philosophy scandal", [[C. D. Broad]], 1926) are to be considered.{{Citation needed|date=April 2015}}
  −
 
  −
In health and biology, conventional scientific approaches are based on experimentation. For these approaches, the limiting factor is the relevant data that can confirm or refute the initial hypothesis.
  −
A new postulate is accepted now in biosciences: the information provided by the data in huge volumes (omics) without prior hypothesis is complementary and sometimes necessary to conventional approaches based on experimentation. In the massive approaches it is the formulation of a relevant hypothesis to explain the data that is the limiting factor. The search logic is reversed and the limits of induction ("Glory of Science and Philosophy scandal", C. D. Broad, 1926) are to be considered.
  −
 
  −
在健康和生物学领域,传统的科学方法是建立在实验的基础上的。对于这些方法,限制因素是相关的数据,可以证实或反驳最初的假设。生物科学现在接受了一个新的假设: 没有事先假设的大量数据(组学)所提供的信息是互补的,有时是基于实验的传统方法所必需的。在大量的方法中,它是一个相关假设的表述,以解释数据,这是限制因素。搜索的逻辑是颠倒的,归纳法的局限性(“科学的荣耀与哲学的丑闻”,C.d. 布罗德,1926)是需要考虑的。
  −
 
  −
'''''【终译版】'''''在健康和生物学领域,传统的科学方法是基于实验的。对于这些方法,限制因素是相关的数据,可以证实或反驳最初的假设。如今,生物科学界接受了一个新的假设:大量数据(组学)提供的信息(无需事先假设)是对基于实验的传统方法的补充,有时是必需的。在大数据方法中,相关假设的表述是解释数据的限制因素。搜索逻辑被颠倒以及归纳的局限性是需要考虑的。
     −
[[Consumer privacy|Privacy]] advocates are concerned about the threat to privacy represented by increasing storage and integration of [[personally identifiable information]]; expert panels have released various policy recommendations to conform practice to expectations of privacy.<ref>{{cite magazine |first=Paul |last=Ohm |title=Don't Build a Database of Ruin |magazine=Harvard Business Review |url=http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html|date=23 August 2012 }}</ref> The misuse of big data in several cases by media, companies, and even the government has allowed for abolition of trust in almost every fundamental institution holding up society.<ref>Bond-Graham, Darwin (2018). [https://www.theperspective.com/debates/the-perspective-on-big-data/ "The Perspective on Big Data"]. [[The Perspective]].</ref>
      +
与此大致相同的是,有人指出,基于大数据分析的决策不可避免地“像过去一样,或者充其量也像现在一样,受到世界的影响”。<ref name="HilbertBigData2013">Hilbert, M. (2016). Big Data for Development: A Review of Promises and Challenges. Development Policy Review, 34(1), 135–174. https://doi.org/10.1111/dpr.12142 free access: https://www.martinhilbert.net/big-data-for-development/</ref>如果未来与过去相似,通过大量关于过去经验的数据,算法可以预测未来的发展。如果系统对未来的动态变化(如果它不是一个平稳的过程),那么过去对未来的影响就很小。<ref name="HilbertTEDx">[https://www.youtube.com/watch?v=UXef6yfJZAI Big Data requires Big Visions for Big Change.], Hilbert, M. (2014). London: TEDx UCL, x=independently organized TED talks</ref>为了在不断变化的环境中做出预测,有必要对系统动力学有一个透彻的了解。<ref name="HilbertTEDx"/>作为对这一批评的回应,Alemany Oliver和Vayre建议使用“诱因推理作为研究过程的第一步,以便为消费者的数字痕迹提供背景,并使新的理论出现”。<ref>{{cite journal|last=Alemany Oliver|first=Mathieu |author2=Vayre, Jean-Sebastien |s2cid=111360835 |title= Big Data and the Future of Knowledge Production in Marketing Research: Ethics, Digital Traces, and Abductive Reasoning|journal=Journal of Marketing Analytics |year=2015|volume=3|issue=1|doi= 10.1057/jma.2015.1|pages=5–13}}</ref>此外,有人建议将大数据方法与计算机模拟相结合,例如基于代理的模型<ref name="HilbertBigData2013" />和复杂系统。通过基于一系列相互依赖的算法的计算机模拟,基于代理的模型在预测甚至未知场景的社会复杂性的结果方面越来越好。<ref>{{cite web|url= https://www.theatlantic.com/magazine/archive/2002/04/seeing-around-corners/302471/| title=Seeing Around Corners|author=Jonathan Rauch|date=1 April 2002|work=[[The Atlantic]]}}</ref><ref>Epstein, J. M., & Axtell, R. L. (1996). Growing Artificial Societies: Social Science from the Bottom Up. A Bradford Book.</ref>最后,探索数据潜在结构的多变量方法的使用,如因子分析和聚类分析,已被证明是有用的分析方法,远远超出了通常用于较小数据集的双变量方法。
      −
'''''【终译版】'''''隐私倡导者担心个人身份信息的存储和收集增加了对隐私的威胁;专家小组发布了各种政策建议,以使实践符合对隐私的期望。媒体、公司甚至政府在几起案件中滥用大数据,导致几乎所有支撑社会的基本机构都失去了信任。
+
在健康和生物学领域,传统的科学方法是基于实验的。对于这些方法,限制因素是相关的数据,可以证实或反驳最初的假设。<ref>{{cite web|url=http://www.bigdataparis.com/documents/Pierre-Delort-INSERM.pdf#page=5| title=Delort P., Big data in Biosciences, Big Data Paris, 2012|website =Bigdataparis.com |access-date=8 October 2017}}</ref>如今,生物科学界接受了一个新的假设:大量数据(组学)提供的信息(无需事先假设)是对基于实验的传统方法的补充,有时是必需的。<ref>{{cite web|url=https://www.cs.cmu.edu/~durand/03-711/2011/Literature/Next-Gen-Genomics-NRG-2010.pdf|title=Next-generation genomics: an integrative approach|date=July 2010|publisher=nature|access-date=18 October 2016}}</ref><ref>{{cite web|url= https://www.researchgate.net/publication/283298499|title=Big Data in Biosciences| date=October 2015|access-date=18 October 2016}}</ref>在大数据方法中,相关假设的表述是解释数据的限制因素。<ref>{{cite news|url=https://next.ft.com/content/21a6e7d8-b479-11e3-a09a-00144feabdc0|title=Big data: are we making a big mistake?|date=28 March 2014|work=Financial Times|access-date=20 October 2016}}</ref>搜索逻辑被颠倒以及归纳的局限性是需要考虑的。
   −
Nayef Al-Rodhan argues that a new kind of social contract will be needed to protect individual liberties in the context of big data and giant corporations that own vast amounts of information, and that the use of big data should be monitored and better regulated at the national and international levels.<ref>{{Cite news|url=http://hir.harvard.edu/the-social-contract-2-0-big-data-and-the-need-to-guarantee-privacy-and-civil-liberties/|title=The Social Contract 2.0: Big Data and the Need to Guarantee Privacy and Civil Liberties – Harvard International Review|last=Al-Rodhan|first=Nayef|date=16 September 2014|work=Harvard International Review|access-date=3 April 2017|archive-url=https://web.archive.org/web/20170413090835/http://hir.harvard.edu/the-social-contract-2-0-big-data-and-the-need-to-guarantee-privacy-and-civil-liberties/|archive-date=13 April 2017|url-status=dead}}</ref> Barocas and Nissenbaum argue that one way of protecting individual users is by being informed about the types of information being collected, with whom it is shared, under what constraints and for what purposes.<ref>{{Cite book|title=Big Data's End Run around Anonymity and Consent| last1 =Barocas |first1=Solon |last2=Nissenbaum |first2=Helen|last3=Lane|first3=Julia|last4=Stodden|first4=Victoria|last5=Bender|first5=Stefan|last6=Nissenbaum|first6=Helen| s2cid =152939392|date=June 2014| publisher =Cambridge University Press|isbn=9781107067356|pages=44–75|doi =10.1017/cbo9781107590205.004}}</ref>
      +
隐私倡导者担心个人身份信息的存储和收集增加了对隐私的威胁;专家小组发布了各种政策建议,以使实践符合对隐私的期望。<ref>{{cite magazine |first=Paul |last=Ohm |title=Don't Build a Database of Ruin |magazine=Harvard Business Review |url=http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html|date=23 August 2012 }}</ref> 媒体、公司甚至政府在几起案件中滥用大数据,导致几乎所有支撑社会的基本机构都失去了信任。<ref>Bond-Graham, Darwin (2018). [https://www.theperspective.com/debates/the-perspective-on-big-data/ "The Perspective on Big Data"]. [[The Perspective]].</ref>
   −
'''''【终译版】'''''Nayef Al-Rodhan认为,在大数据和拥有大量信息的大公司的背景下,需要一种新的社会契约来保护个人自由,大数据的使用应该在国家和国际层面受到更好的监管。Barocas和Nissenbaum认为,保护个人用户的一种方法是,让用户了解所收集的信息类型、与谁共享信息、在什么约束下以及出于什么目的。
     −
===Critiques of the "V" model===
+
Nayef Al-Rodhan认为,在大数据和拥有大量信息的大公司的背景下,需要一种新的社会契约来保护个人自由,大数据的使用应该在国家和国际层面受到更好的监管。<ref>{{Cite news|url=http://hir.harvard.edu/the-social-contract-2-0-big-data-and-the-need-to-guarantee-privacy-and-civil-liberties/|title=The Social Contract 2.0: Big Data and the Need to Guarantee Privacy and Civil Liberties – Harvard International Review|last=Al-Rodhan|first=Nayef|date=16 September 2014|work=Harvard International Review|access-date=3 April 2017|archive-url=https://web.archive.org/web/20170413090835/http://hir.harvard.edu/the-social-contract-2-0-big-data-and-the-need-to-guarantee-privacy-and-civil-liberties/|archive-date=13 April 2017|url-status=dead}}</ref>Barocas和Nissenbaum认为,保护个人用户的一种方法是,让用户了解所收集的信息类型、与谁共享信息、在什么约束下以及出于什么目的。<ref>{{Cite book|title=Big Data's End Run around Anonymity and Consent| last1 =Barocas |first1=Solon |last2=Nissenbaum |first2=Helen|last3=Lane|first3=Julia|last4=Stodden|first4=Victoria|last5=Bender|first5=Stefan|last6=Nissenbaum|first6=Helen| s2cid =152939392|date=June 2014| publisher =Cambridge University Press|isbn=9781107067356|pages=44–75|doi =10.1017/cbo9781107590205.004}}</ref>
The "V" model of big data is concerning as it centers around computational scalability and lacks in a loss around the perceptibility and understandability of information. This led to the framework of [[cognitive big data]], which characterizes big data applications according to:<ref>{{Cite journal|last1=Lugmayr|first1=Artur|last2=Stockleben|first2=Bjoern|last3=Scheib|first3=Christoph|last4=Mailaparampil|first4=Mathew|last5=Mesia|first5=Noora|last6=Ranta|first6=Hannu|last7=Lab|first7=Emmi|date=1 June 2016|title=A Comprehensive Survey On Big-Data Research and Its Implications – What is Really 'New' in Big Data? – It's Cognitive Big Data! |url=https://www.researchgate.net/publication/304784955}}</ref>
  −
* Data completeness: understanding of the non-obvious from data
  −
* Data correlation, causation, and predictability: causality as not essential requirement to achieve predictability
  −
* Explainability and interpretability: humans desire to understand and accept what they understand, where algorithms do not cope with this
  −
* Level of automated decision making: algorithms that support automated decision making and algorithmic self-learning
         
=== 针对“ v”模型的批评 ===
 
=== 针对“ v”模型的批评 ===
大数据的“V”模型令人担忧,因为它以计算的可延展性为中心,缺乏信息的可感知性和可理解性。这导致了认知大数据框架的形成,该框架根据以下特点描述了大数据应用:
+
大数据的“V”模型令人担忧,因为它以计算的可延展性为中心,缺乏信息的可感知性和可理解性。这导致了认知大数据框架的形成,该框架根据以下特点描述了大数据应用:<ref>{{Cite journal|last1=Lugmayr|first1=Artur|last2=Stockleben|first2=Bjoern|last3=Scheib|first3=Christoph|last4=Mailaparampil|first4=Mathew|last5=Mesia|first5=Noora|last6=Ranta|first6=Hannu|last7=Lab|first7=Emmi|date=1 June 2016|title=A Comprehensive Survey On Big-Data Research and Its Implications – What is Really 'New' in Big Data? – It's Cognitive Big Data! |url=https://www.researchgate.net/publication/304784955}}</ref>
 
  −
数据完整性:从数据中理解意义不明的信息。
  −
 
  −
数据相关性、因果关系和可预测性:因果关系不是实现可预测性的必要条件。
  −
 
  −
可解释性和可解释性:人类渴望理解并接受他们所理解的,而算法无法实现这一目标。
  −
 
  −
自动决策的水平:支持自动决策算法和自我学习算法。
      +
*数据完整性:从数据中理解意义不明的信息。
 +
*数据相关性、因果关系和可预测性:因果关系不是实现可预测性的必要条件。
 +
*可解释性和可解释性:人类渴望理解并接受他们所理解的,而算法无法实现这一目标。
 +
*自动决策的水平:支持自动决策算法和自我学习算法。
      第482行: 第393行:     
=== 针对大数据执行的批评 ===
 
=== 针对大数据执行的批评 ===
[[Ulf-Dietrich Reips]] and Uwe Matzat wrote in 2014 that big data had become a "fad" in scientific research.<ref name="pigdata" /> Researcher [[danah boyd]] has raised concerns about the use of big data in science neglecting principles such as choosing a [[Sampling (statistics)|representative sample]] by being too concerned about handling the huge amounts of data.<ref name="danah">{{cite web | url=http://www.danah.org/papers/talks/2010/WWW2010.html | title=Privacy and Publicity in the Context of Big Data | author=danah boyd | work=[[World Wide Web Conference|WWW 2010 conference]] | date=29 April 2010 | access-date = 18 April 2011| author-link=danah boyd }}</ref> This approach may lead to results that have a [[Bias (statistics)|bias]] in one way or another.<ref>{{Cite journal|last=Katyal|first=Sonia K.|date=2019|title=Artificial Intelligence, Advertising, and Disinformation|url=https://muse.jhu.edu/article/745987|journal=Advertising & Society Quarterly|language=en|volume=20|issue=4|doi=10.1353/asr.2019.0026|s2cid=213397212|issn=2475-1790}}</ref> Integration across heterogeneous data resources—some that might be considered big data and others not—presents formidable logistical as well as analytical challenges, but many researchers argue that such integrations are likely to represent the most promising new frontiers in science.<ref>{{cite journal |last1=Jones |first1=MB |last2=Schildhauer |first2=MP |last3=Reichman |first3=OJ |last4=Bowers | first4=S |title=The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere | journal=Annual Review of Ecology, Evolution, and Systematics |volume=37 |issue=1 |pages=519–544 |year=2006 |doi=10.1146/annurev.ecolsys.37.091305.110031 |url= http://www.pnamp.org/sites/default/files/Jones2006_AREES.pdf }}</ref>
+
Ulf Dietrich Reips和Uwe Matzat在2014年写道,大数据已经成为科学研究的“风潮”。<ref name="pigdata" />研究人员Danah Boyd对大数据在科学中的使用提出了担忧,因为研究往往忽略了一些原则,比如选择代表性样本时过于关注处理大量数据,<ref name="danah">{{cite web | url=http://www.danah.org/papers/talks/2010/WWW2010.html | title=Privacy and Publicity in the Context of Big Data | author=danah boyd | work=[[World Wide Web Conference|WWW 2010 conference]] | date=29 April 2010 | access-date = 18 April 2011| author-link=danah boyd }}</ref>这种方法可能会导致结果在某种程度上存在偏差。<ref>{{Cite journal|last=Katyal|first=Sonia K.|date=2019|title=Artificial Intelligence, Advertising, and Disinformation|url=https://muse.jhu.edu/article/745987|journal=Advertising & Society Quarterly|language=en|volume=20|issue=4|doi=10.1353/asr.2019.0026|s2cid=213397212|issn=2475-1790}}</ref>大量异构数据资源的集成(有些被认为是大数据,有些则不是)带来巨大的后勤和分析挑战,但许多研究人员认为,这种集成可能代表着科学领域最有前途的新前沿。<ref>{{cite journal |last1=Jones |first1=MB |last2=Schildhauer |first2=MP |last3=Reichman |first3=OJ |last4=Bowers | first4=S |title=The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere | journal=Annual Review of Ecology, Evolution, and Systematics |volume=37 |issue=1 |pages=519–544 |year=2006 |doi=10.1146/annurev.ecolsys.37.091305.110031 |url= http://www.pnamp.org/sites/default/files/Jones2006_AREES.pdf }}</ref>在这篇颇具煽动性的文章《大数据的关键问题》(Critical Questions for Big Data)中,<ref name="danah2">{{cite journal | doi = 10.1080/1369118X.2012.678878| title = Critical Questions for Big Data| journal = Information, Communication & Society| volume = 15| issue = 5| pages = 662–679| year = 2012| last1 = Boyd | first1 = D. | last2 = Crawford | first2 = K. | s2cid = 51843165| hdl = 10983/1320| hdl-access = free}}</ref>作者将大数据称为神话的一部分:“大数据集提供了更高形式的智能和知识……大数据的用户往往“迷失在庞大的数据量中”,而且“使用大数据仍然是主观的,它量化的东西不一定能够更接近客观事实”。<ref name="danah2" />BI领域的最新发展,例如前瞻性报告,特别是通过自动过滤无用数据及相关性来改善大数据的可用性。<ref name="Big Decisions White Paper">[http://www.fortewares.com/Administrator/userfiles/Banner/forte-wares--pro-active-reporting_EN.pdf Failure to Launch: From Big Data to Big Decisions] Forte Wares.</ref>大数据充满了虚假的相关性,<ref>{{Cite web | url=https://www.tylervigen.com/spurious-correlations | title=15 Insane Things That Correlate with Each Other}}</ref>要么是因为非因果巧合(真大数定律),要么是大随机数的唯一性<ref>[https://onlinelibrary.wiley.com/loi/10982418 Random structures & algorithms]</ref> (拉姆齐理论)或其他未发现的因素,因此早期实验者建立大型数字数据库“用数据说话”以及宣称的革新科学方法都受到了质疑。<ref>Cristian S. Calude, Giuseppe Longo, (2016), The Deluge of Spurious Correlations in Big Data, [[Foundations of Science]]</ref>
In the provocative article "Critical Questions for Big Data",<ref name="danah2">{{cite journal | doi = 10.1080/1369118X.2012.678878| title = Critical Questions for Big Data| journal = Information, Communication & Society| volume = 15| issue = 5| pages = 662–679| year = 2012| last1 = Boyd | first1 = D. | last2 = Crawford | first2 = K. | s2cid = 51843165| hdl = 10983/1320| hdl-access = free}}</ref> the authors title big data a part of [[mythology]]: "large data sets offer a higher form of intelligence and knowledge [...], with the aura of truth, objectivity, and accuracy". Users of big data are often "lost in the sheer volume of numbers", and "working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth".<ref name="danah2" /> Recent developments in BI domain, such as pro-active reporting especially target improvements in the usability of big data, through automated [[Filter (software)|filtering]] of [[spurious relationship|non-useful data and correlations]].<ref name="Big Decisions White Paper">[http://www.fortewares.com/Administrator/userfiles/Banner/forte-wares--pro-active-reporting_EN.pdf Failure to Launch: From Big Data to Big Decisions] {{Webarchive|url=https://web.archive.org/web/20161206145026/http://www.fortewares.com/Administrator/userfiles/Banner/forte-wares--pro-active-reporting_EN.pdf |date=6 December 2016 }}, Forte Wares.</ref> Big structures are full of spurious correlations<ref>{{Cite web | url=https://www.tylervigen.com/spurious-correlations | title=15 Insane Things That Correlate with Each Other}}</ref> either because of non-causal coincidences ([[law of truly large numbers]]), solely nature of big randomness<ref>[https://onlinelibrary.wiley.com/loi/10982418 Random structures & algorithms]</ref> ([[Ramsey theory]]), or existence of [[confounding factor|non-included factors]] so the hope, of early experimenters to make large databases of numbers "speak for themselves" and revolutionize scientific method, is questioned.<ref>Cristian S. Calude, Giuseppe Longo, (2016), The Deluge of Spurious Correlations in Big Data, [[Foundations of Science]]</ref>
     −
Big data analysis is often shallow compared to analysis of smaller data sets.<ref name="kdnuggets-berchthold">{{cite web|url=http://www.kdnuggets.com/2014/08/interview-michael-berthold-knime-research-big-data-privacy-part2.html|title=Interview: Michael Berthold, KNIME Founder, on Research, Creativity, Big Data, and Privacy, Part 2|date=12 August 2014|author=Gregory Piatetsky| author-link= Gregory I. Piatetsky-Shapiro|publisher=KDnuggets|access-date=13 August 2014}}</ref> In many big data projects, there is no large data analysis happening, but the challenge is the [[extract, transform, load]] part of data pre-processing.<ref name="kdnuggets-berchthold" />
      +
与对较小数据集的分析相比,大数据分析往往是肤浅的。<ref name="kdnuggets-berchthold">{{cite web|url=http://www.kdnuggets.com/2014/08/interview-michael-berthold-knime-research-big-data-privacy-part2.html|title=Interview: Michael Berthold, KNIME Founder, on Research, Creativity, Big Data, and Privacy, Part 2|date=12 August 2014|author=Gregory Piatetsky| author-link= Gregory I. Piatetsky-Shapiro|publisher=KDnuggets|access-date=13 August 2014}}</ref>在许多大数据项目中,没有进行大数据分析,但挑战在于提取、转换、加载和预处理数据的部分。<ref name="kdnuggets-berchthold" />
   −
'''''【终译版】'''''Ulf Dietrich Reips和Uwe Matzat在2014年写道,大数据已经成为科学研究的“风潮”。研究人员Danah Boyd对大数据在科学中的使用提出了担忧,因为研究往往忽略了一些原则,比如选择代表性样本时过于关注处理大量数据,这种方法可能会导致结果在某种程度上存在偏差。大量异构数据资源的集成(有些被认为是大数据,有些则不是)带来巨大的后勤和分析挑战,但许多研究人员认为,这种集成可能代表着科学领域最有前途的新前沿。在这篇颇具煽动性的文章《大数据的关键问题》(Critical Questions for Big Data)中,作者将大数据称为神话的一部分:“大数据集提供了更高形式的智能和知识……大数据的用户往往“迷失在庞大的数据量中”,而且“使用大数据仍然是主观的,它量化的东西不一定能够更接近客观事实”。BI领域的最新发展,例如前瞻性报告,特别是通过自动过滤无用数据及相关性来改善大数据的可用性。大数据充满了虚假的相关性,要么是因为非因果巧合(真大数定律),要么是大随机数的唯一性(拉姆齐理论)或其他未发现的因素,因此早期实验者建立大型数字数据库“用数据说话”以及宣称的革新科学方法都受到了质疑。
     −
与对较小数据集的分析相比,大数据分析往往是肤浅的。在许多大数据项目中,没有进行大数据分析,但挑战在于提取、转换、加载和预处理数据的部分。
+
大数据是一个时髦的“模糊术语”,<ref>{{cite news|last1=Pelt|first1=Mason|title="Big Data" is an over used buzzword and this Twitter bot proves it|url= http://siliconangle.com/blog/2015/10/26/big-data-is-an-over-used-buzzword-and-this-twitter-bot-proves-it/ |newspaper=Siliconangle|access-date=4 November 2015|date=26 October 2015}}</ref><ref name="ft-harford">{{cite web |url=http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html |title=Big data: are we making a big mistake? |last1=Harford |first1=Tim |date=28 March 2014 |website=[[Financial Times]] |access-date=7 April 2014}}</ref>但同时也是企业家、咨询师、科学家和媒体的关注热点。<ref name="ft-harford" />近年来,谷歌流感趋势(Google Flu Trends)等大数据应用在最近几年未能提供好的预测,将流感疫情高估了两倍。类似地,基于Twitter的奥斯卡奖和选举预测往往偏离目标。大数据往往与小数据面临同样的挑战;添加更多数据并不能解决偏见问题,甚至可能会强调其他问题。尤其是Twitter等数据源不能代表整体人口的意见,从这些数据源得出的结果可能会导致错误结论。基于文本大数据统计分析的谷歌翻译在翻译网页方面做得很好。然而,来自专门领域的结果可能会有很大的偏差。另一方面,大数据也可能带来新的问题,比如多重比较问题:同时测试大量假设可能会产生许多错误的结果,这些结果被错误地认为是重要的。Ioannidis认为,“大多数已发表的研究结果都是错误的”,<ref name="Ioannidis">{{cite journal | vauthors = Ioannidis JP | title = Why most published research findings are false | journal = PLOS Medicine | volume = 2 | issue = 8 | pages = e124 | date = August 2005 | pmid = 16060722 | pmc = 1182327 | doi = 10.1371/journal.pmed.0020124 | author-link1 = John P. A. Ioannidis }}</ref>其原因基本上是相同的:当许多科学团队和研究人员各自进行许多实验(即处理大量科学数据;尽管没有使用大数据技术),一个“重大”结果被错误的可能性会迅速增加。而当只有正面的结果被公布时,这种可能性更大。
   −
Big data is a [[buzzword]] and a "vague term",<ref>{{cite news|last1=Pelt|first1=Mason|title="Big Data" is an over used buzzword and this Twitter bot proves it|url= http://siliconangle.com/blog/2015/10/26/big-data-is-an-over-used-buzzword-and-this-twitter-bot-proves-it/ |newspaper=Siliconangle|access-date=4 November 2015|date=26 October 2015}}</ref><ref name="ft-harford">{{cite web |url=http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html |title=Big data: are we making a big mistake? |last1=Harford |first1=Tim |date=28 March 2014 |website=[[Financial Times]] |access-date=7 April 2014}}</ref> but at the same time an "obsession"<ref name="ft-harford" /> with entrepreneurs, consultants, scientists, and the media. Big data showcases such as [[Google Flu Trends]] failed to deliver good predictions in recent years, overstating the flu outbreaks by a factor of two. Similarly, [[Academy awards]] and election predictions solely based on Twitter were more often off than on target.
+
此外,大数据分析结果只能与预测模型一样好。例如,大数据参与了对2016年美国总统选举结果的预测,<ref>{{Cite news|url=https://www.nytimes.com/2016/11/10/technology/the-data-said-clinton-would-win-why-you-shouldnt-have-believed-it.html|title=How Data Failed Us in Calling an Election |last1=Lohr|first1=Steve|date=10 November 2016|last2=Singer|first2=Natasha|newspaper=The New York Times|issn=0362-4331|access-date=27 November 2016}}</ref>并取得了与预测模型类似的不同程度的成功。
Big data often poses the same challenges as small data; adding more data does not solve problems of bias, but may emphasize other problems. In particular data sources such as Twitter are not representative of the overall population, and results drawn from such sources may then lead to wrong conclusions. [[Google Translate]]—which is based on big data statistical analysis of text—does a good job at translating web pages. However, results from specialized domains may be dramatically skewed.
  −
On the other hand, big data may also introduce new problems, such as the [[multiple comparisons problem]]: simultaneously testing a large set of hypotheses is likely to produce many false results that mistakenly appear significant.
  −
Ioannidis argued that "most published research findings are false"<ref name="Ioannidis">{{cite journal | vauthors = Ioannidis JP | title = Why most published research findings are false | journal = PLOS Medicine | volume = 2 | issue = 8 | pages = e124 | date = August 2005 | pmid = 16060722 | pmc = 1182327 | doi = 10.1371/journal.pmed.0020124 | author-link1 = John P. A. Ioannidis }}</ref> due to essentially the same effect: when many scientific teams and researchers each perform many experiments (i.e. process a big amount of scientific data; although not with big data technology), the likelihood of a "significant" result being false grows fast – even more so, when only positive results are published.
  −
<!-- sorry, this started overlapping with above section more and more... merging is welcome; I already dropped the intended subheadline "Hype cycle and inflated expectations". -->
  −
Furthermore, big data analytics results are only as good as the model on which they are predicated.  In an example, big data took part in attempting to predict the results of the 2016 U.S. Presidential Election<ref>{{Cite news|url=https://www.nytimes.com/2016/11/10/technology/the-data-said-clinton-would-win-why-you-shouldnt-have-believed-it.html|title=How Data Failed Us in Calling an Election |last1=Lohr|first1=Steve|date=10 November 2016|last2=Singer|first2=Natasha|newspaper=The New York Times|issn=0362-4331|access-date=27 November 2016}}</ref> with varying degrees of success.
     −
  −
'''''【终译版】'''''大数据是一个时髦的“模糊术语”,但同时也是企业家、咨询师、科学家和媒体的关注热点。近年来,谷歌流感趋势(Google Flu Trends)等大数据应用在最近几年未能提供好的预测,将流感疫情高估了两倍。类似地,基于Twitter的奥斯卡奖和选举预测往往偏离目标。大数据往往与小数据面临同样的挑战;添加更多数据并不能解决偏见问题,甚至可能会强调其他问题。尤其是Twitter等数据源不能代表整体人口的意见,从这些数据源得出的结果可能会导致错误结论。基于文本大数据统计分析的谷歌翻译在翻译网页方面做得很好。然而,来自专门领域的结果可能会有很大的偏差。另一方面,大数据也可能带来新的问题,比如多重比较问题:同时测试大量假设可能会产生许多错误的结果,这些结果被错误地认为是重要的。Ioannidis认为,“大多数已发表的研究结果都是错误的”,其原因基本上是相同的:当许多科学团队和研究人员各自进行许多实验(即处理大量科学数据;尽管没有使用大数据技术),一个“重大”结果被错误的可能性会迅速增加。而当只有正面的结果被公布时,这种可能性更大。
  −
  −
此外,大数据分析结果只能与预测模型一样好。例如,大数据参与了对2016年美国总统选举结果的预测,并取得了与预测模型类似的不同程度的成功。
  −
  −
  −
=== Critiques of big data policing and surveillance ===
  −
Big data has been used in policing and surveillance by institutions like [[Law enforcement in the United States|law enforcement]] and [[Corporate surveillance|corporations]].<ref>{{Cite news|url=https://www.economist.com/open-future/2018/06/04/how-data-driven-policing-threatens-human-freedom|title=How data-driven policing threatens human freedom|date=4 June 2018|newspaper=The Economist|access-date=27 October 2019|issn=0013-0613}}</ref> Due to the less visible nature of data-based surveillance as compared to traditional methods of policing, objections to big data policing are less likely to arise. According to Sarah Brayne's ''Big Data Surveillance: The Case of Policing'',<ref>{{Cite journal|last=Brayne|first=Sarah|s2cid=3609838|date=29 August 2017|title=Big Data Surveillance: The Case of Policing|journal=American Sociological Review |volume=82|issue=5|pages=977–1008|language=en|doi=10.1177/0003122417725865}}</ref> big data policing can reproduce existing [[Social inequality|societal inequalities]] in three ways:
  −
  −
Big data has been used in policing and surveillance by institutions like law enforcement and corporations. Due to the less visible nature of data-based surveillance as compared to traditional methods of policing, objections to big data policing are less likely to arise. According to Sarah Brayne's Big Data Surveillance: The Case of Policing, big data policing can reproduce existing societal inequalities in three ways:
  −
  −
  −
  −
'''''【终译版】'''''
      
=== 针对大数据监管和监视批评 ===
 
=== 针对大数据监管和监视批评 ===
大数据已被执法和企业等机构用于警务和监视。与传统的警务方法相比,基于数据的监控不那么明显,因此反对大数据警务的可能性较小。根据Sarah Brayne的《大数据监控:警务案例》(Big Data Surveillance: The Case of Policing),大数据警务会通过三种方式加剧现有的社会不平等:
+
大数据已被执法和企业等机构用于警务和监视。<ref>{{Cite news|url=https://www.economist.com/open-future/2018/06/04/how-data-driven-policing-threatens-human-freedom|title=How data-driven policing threatens human freedom|date=4 June 2018|newspaper=The Economist|access-date=27 October 2019|issn=0013-0613}}</ref> 与传统的警务方法相比,基于数据的监控不那么明显,因此反对大数据警务的可能性较小。根据Sarah Brayne的《大数据监控:警务案例 Big Data Surveillance: The Case of Policing》,<ref>{{Cite journal|last=Brayne|first=Sarah|s2cid=3609838|date=29 August 2017|title=Big Data Surveillance: The Case of Policing|journal=American Sociological Review |volume=82|issue=5|pages=977–1008|language=en|doi=10.1177/0003122417725865}}</ref> 大数据警务会通过三种方式加剧现有的社会不平等:
   −
通过使用一个数学的无偏算法,将嫌疑犯置于更严格的监视之下。
+
*通过使用一个数学的无偏算法,将嫌疑犯置于更严格的监视之下。
   −
增加执法跟踪的范围和人数,并加剧刑事司法系统中存在的特定种族比例过高的现象。
+
*增加执法跟踪的范围和人数,并加剧刑事司法系统中存在的特定种族比例过高的现象。
   −
鼓励社会成员放弃与产生数字痕迹的机构的互动,从而为社会包容制造障碍。
+
*鼓励社会成员放弃与产生数字痕迹的机构的互动,从而为社会包容制造障碍。
    
如果以上潜在的问题得不到纠正或监管,大数据监管的影响可能会继续影响社会分化。Brayne还指出,谨慎地使用大数据监管可以防止个人层面的偏见变成制度层面的偏见。
 
如果以上潜在的问题得不到纠正或监管,大数据监管的影响可能会继续影响社会分化。Brayne还指出,谨慎地使用大数据监管可以防止个人层面的偏见变成制度层面的偏见。
   −
==In popular culture==
+
==在流行文化中==
===Books===
+
===书籍===
*''[[Moneyball]]'' is a non-fiction book that explores how the Oakland Athletics used statistical analysis to outperform teams with larger budgets. In 2011 a [[Moneyball (film)|film adaptation]] starring [[Brad Pitt]] was released.
+
*《点球成金》(Moneyball)是一本非虚构的书,书中探讨了探讨奥克兰田径队如何利用统计分析来超越那些预算较大的球队。2011年,由布拉德·皮特主演的改编电影上映。
   −
*Moneyball is a non-fiction book that explores how the Oakland Athletics used statistical analysis to outperform teams with larger budgets. In 2011 a film adaptation starring Brad Pitt was released.
+
===电影===
 +
*在《美国队长:寒冬战士 Captain America: The Winter Soldier》中,H.Y.D.R.A (伪装成神盾局)开发了一种利用数据来确定和消除全球威胁的飞行母舰。
   −
《点球成金》是一本非小说类书籍,书中探讨了奥克兰运动家是如何利用统计分析来超越那些预算较大的团队的。2011年,由布拉德 · 皮特主演的改编电影上映。
+
*在《蝙蝠侠: 黑暗骑士 The Dark Knight》中,蝙蝠侠使用的声纳设备可以监视整个哥谭市,这些数据是通过市内居民的手机收集的。
   −
'''''【终译版】'''''《点球成金》(Moneyball)是一本非虚构的书,书中探讨了探讨奥克兰田径队如何利用统计分析来超越那些预算较大的球队。2011年,由布拉德·皮特主演的改编电影上映。
     −
===Film===
+
== 参见 ==
*In ''[[Captain America: The Winter Soldier]]'', H.Y.D.R.A (disguised as [[S.H.I.E.L.D]]) develops helicarriers that use data to determine and eliminate threats over the globe.
+
{{columns-list|colwidth=26em|
*In ''[[The Dark Knight (film)|The Dark Knight]]'', [[Batman]] uses a sonar device that can spy on all of [[Gotham City]]. The data is gathered from the mobile phones of people within the city.
+
*[[大数据伦理]]
 
+
*[[大数据成熟度模型]]
*In Captain America: The Winter Soldier, H.Y.D.R.A (disguised as S.H.I.E.L.D) develops helicarriers that use data to determine and eliminate threats over the globe.
+
*[[大内存]]  
*In The Dark Knight, Batman uses a sonar device that can spy on all of Gotham City. The data is gathered from the mobile phones of people within the city.
+
*[[数据整理]]
 
+
*[[数据定义存储]]
* 美国队长: 冬兵》(Captain America: The Winter Soldier)中,H.Y.D.R.A (伪装成神盾局)开发了一种利用数据来确定和消除全球威胁的飞行母舰。
+
*[[数据谱系]]
* 在《蝙蝠侠: 黑暗骑士》中,蝙蝠侠使用了一种可以监视整个哥谭市的声纳设备。这些数据是通过城市里人们的手机收集的。
+
*[[数据慈善]]
'''''【终译版】'''''
+
*[[数据科学]]
 +
*[[数据化]]
 +
*[[面向文档的数据库]]
 +
*[[内存处理]]
 +
*[[城市信息学]]
   −
在《美国队长:寒冬战士》(Captain America: The Winter Soldier)一书中,H.Y.D.R.A (伪装成神盾局)开发了一种利用数据来确定和消除全球威胁的飞行母舰。
  −
  −
在《蝙蝠侠: 黑暗骑士》中,蝙蝠侠使用的声纳设备可以监视整个哥谭市,这些数据是通过市内居民的手机收集的。
  −
  −
= 关联条目 =
  −
{{columns-list|colwidth=26em|
  −
*[[Big data ethics 大数据伦理]]
  −
*[[Big Data Maturity Model 大数据成熟度模型]]
  −
*[[Big memory 大内存]]
  −
*[[Data curation 数据整理]]
  −
*[[Data defined storage 数据定义存储]]
  −
*[[Data lineage 数据谱系]]
  −
*[[Data philanthropy 数据慈善]]
  −
*[[Data science 数据科学
  −
]]
  −
*[[Datafication 数据化]]
  −
*[[Document-oriented database 面向文档的数据库]]
  −
*[[In-memory processing 内存处理]]
  −
*[[List of big data companies大数据公司名单]]
  −
*[[Urban informatics 城市信息学]]
  −
*[[Very large database 超大数据库]]
  −
*[[XLDB]]}}
      
== 参考文献 ==
 
== 参考文献 ==
第583行: 第455行:  
}}
 
}}
   −
= 拓展材料 =
+
 
 +
==拓展材料==
 
*{{cite magazine|editor1=Peter Kinnaird |editor2=Inbal Talgam-Cohen|magazine=[[XRDS (magazine)|XRDS: Crossroads, The ACM Magazine for Students]]|title=Big Data|volume=19 |issue=1|date=2012|publisher=[[Association for Computing Machinery]]|issn=1528-4980 |oclc=779657714 |url=http://dl.acm.org/citation.cfm?id=2331042}}
 
*{{cite magazine|editor1=Peter Kinnaird |editor2=Inbal Talgam-Cohen|magazine=[[XRDS (magazine)|XRDS: Crossroads, The ACM Magazine for Students]]|title=Big Data|volume=19 |issue=1|date=2012|publisher=[[Association for Computing Machinery]]|issn=1528-4980 |oclc=779657714 |url=http://dl.acm.org/citation.cfm?id=2331042}}
 
*{{cite book|title=Mining of massive datasets|author1=Jure Leskovec|author2-link=Anand Rajaraman|author2=Anand Rajaraman|author3-link=Jeffrey D. Ullman|author3=Jeffrey D. Ullman|year=2014|publisher=Cambridge University Press|url=http://mmds.org/|isbn=9781107077232 |oclc=888463433|author1-link=Jure Leskovec}}
 
*{{cite book|title=Mining of massive datasets|author1=Jure Leskovec|author2-link=Anand Rajaraman|author2=Anand Rajaraman|author3-link=Jeffrey D. Ullman|author3=Jeffrey D. Ullman|year=2014|publisher=Cambridge University Press|url=http://mmds.org/|isbn=9781107077232 |oclc=888463433|author1-link=Jure Leskovec}}
第594行: 第467行:       −
[[Category:Big data| ]]
+
[[Category:大数据]]
[[Category:Data management]]
+
[[Category:数据管理]]
[[Category:Distributed computing problems]]
+
[[Category:分布式计算问题]]
[[Category:Transaction processing]]
+
[[Category:事务处理]]
[[Category:Technology forecasting]]
+
[[Category:技术预测]]
[[Category:Data analysis]]
+
[[Category:数据分析]]
[[Category:Databases]]
+
[[Category:数据库]]
7,129

个编辑