更改

删除191字节 、 2022年2月7日 (一) 20:35
V0.7_20220207_翻译
第1行: 第1行: −
This article is about large collections of data. For the band, see Big Data (band). For the practice of buying and selling of personal and consumer data, see Surveillance capitalism.[[File:Hilbert InfoGrowth.png|thumb|right|400px|Non-linear growth of digital global information-storage capacity and the waning of analog storage<ref>{{cite journal|url= http://www.martinhilbert.net/WorldInfoCapacity.html|title= The World's Technological Capacity to Store, Communicate, and Compute Information|volume= 332|issue= 6025|pages= 60–65|journal=Science|access-date= 13 April 2016|bibcode= 2011Sci...332...60H|last1= Hilbert|first1= Martin|last2= López|first2= Priscila|year= 2011|doi= 10.1126/science.1200970|pmid= 21310967|s2cid= 206531385}}</ref>全球数字信息存储容量的非线性增长和模拟存储的减弱|链接=Special:FilePath/Hilbert_InfoGrowth.png]]
+
This article is about large collections of data. For the band, see Big Data (band). For the practice of buying and selling of personal and consumer data, see Surveillance capitalism.[[File:Hilbert InfoGrowth.png|thumb|right|400px|Non-linear growth of digital global information-storage capacity and the waning of analog storage<ref>{{cite journal|url= http://www.martinhilbert.net/WorldInfoCapacity.html|title= The World's Technological Capacity to Store, Communicate, and Compute Information|volume= 332|issue= 6025|pages= 60–65|journal=Science|access-date= 13 April 2016|bibcode= 2011Sci...332...60H|last1= Hilbert|first1= Martin|last2= López|first2= Priscila|year= 2011|doi= 10.1126/science.1200970|pmid= 21310967|s2cid= 206531385}}</ref>全球数字信息存储容量的非线性增长和模拟存储的减少。|链接=Special:FilePath/Hilbert_InfoGrowth.png]]
    
'''Big data''' is a field that treats ways to analyze, systematically extract information from, or otherwise deal with [[data set]]s that are too large or complex to be dealt with by traditional [[data processing|data-processing]] [[application software]]. Data with many fields (columns) offer greater [[statistical power]], while data with higher complexity (more attributes or columns) may lead to a higher [[false discovery rate]].<ref>{{Cite journal|last=Breur|first=Tom|date=July 2016|title=Statistical Power Analysis and the contemporary "crisis" in social sciences|journal=Journal of Marketing Analytics |publisher=[[Palgrave Macmillan]]|location=London, England|volume=4 |issue=2–3 |pages=61–65 |doi=10.1057/s41270-016-0001-3 |issn=2050-3318|doi-access=free}}</ref> Big data analysis challenges include [[Automatic identification and data capture|capturing data]], [[Computer data storage|data storage]], [[data analysis]], search, [[Data sharing|sharing]], [[Data transmission|transfer]], [[Data visualization|visualization]], [[Query language|querying]], updating, [[information privacy]], and data source. Big data was originally associated with three key concepts: ''volume'', ''variety'', and ''velocity''.<ref name=":0" /> The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling. Therefore, big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time and ''value''.
 
'''Big data''' is a field that treats ways to analyze, systematically extract information from, or otherwise deal with [[data set]]s that are too large or complex to be dealt with by traditional [[data processing|data-processing]] [[application software]]. Data with many fields (columns) offer greater [[statistical power]], while data with higher complexity (more attributes or columns) may lead to a higher [[false discovery rate]].<ref>{{Cite journal|last=Breur|first=Tom|date=July 2016|title=Statistical Power Analysis and the contemporary "crisis" in social sciences|journal=Journal of Marketing Analytics |publisher=[[Palgrave Macmillan]]|location=London, England|volume=4 |issue=2–3 |pages=61–65 |doi=10.1057/s41270-016-0001-3 |issn=2050-3318|doi-access=free}}</ref> Big data analysis challenges include [[Automatic identification and data capture|capturing data]], [[Computer data storage|data storage]], [[data analysis]], search, [[Data sharing|sharing]], [[Data transmission|transfer]], [[Data visualization|visualization]], [[Query language|querying]], updating, [[information privacy]], and data source. Big data was originally associated with three key concepts: ''volume'', ''variety'', and ''velocity''.<ref name=":0" /> The analysis of big data presents challenges in sampling, and thus previously allowing for only observations and sampling. Therefore, big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time and ''value''.
第295行: 第295行:  
多维大数据也可以表示为 OLAP 数据立方体或者数学上的张量。阵列数据库系统已经着手为这种数据类型提供存储和高级查询支持。其他应用于大数据的技术包括高效的基于张量的计算,如多线性子空间学习、大规模并行处理(MPP)数据库、基于搜索的应用程序、数据挖掘、分布式文件系统、分布式缓存(如突发缓冲区和 Memcached)、分布式数据库、基于云和 hpc 的基础设施(应用程序、存储和计算资源) ,以及互联网。虽然已经开发了许多方法和技术,但是仍然很难实现大数据的机器学习。
 
多维大数据也可以表示为 OLAP 数据立方体或者数学上的张量。阵列数据库系统已经着手为这种数据类型提供存储和高级查询支持。其他应用于大数据的技术包括高效的基于张量的计算,如多线性子空间学习、大规模并行处理(MPP)数据库、基于搜索的应用程序、数据挖掘、分布式文件系统、分布式缓存(如突发缓冲区和 Memcached)、分布式数据库、基于云和 hpc 的基础设施(应用程序、存储和计算资源) ,以及互联网。虽然已经开发了许多方法和技术,但是仍然很难实现大数据的机器学习。
   −
'''''【终译版】'''''多维大数据也可以表示为OLAP数据立方体或数学上的张量。阵列数据库系统已经开始提供这种数据类型的存储和高级查询支持。应用于大数据的其他技术包括基于张量的高效计算,如多线性子空间学习、大规模并行处理(MPP)数据库、基于搜索的应用、数据挖掘、分布式文件系统、分布式缓存(如burst buffer和Memcached)、分布式数据库、,基于云和HPC的基础设施(应用程序、存储和计算资源)以及互联网。尽管已经开发了许多方法和技术,但使用大数据进行机器学习仍然很困难。
+
'''''【终译版】'''''多维大数据也可以表示为OLAP数据立方体或数学上的张量。阵列数据库系统已经支持这种数据类型的存储和高级查询。应用于大数据的其他技术包括基于张量的高效计算,如多线性子空间学习、大规模并行处理(MPP)数据库、基于搜索的应用、数据挖掘、分布式文件系统、分布式缓存(如burst buffer和Memcached)、分布式数据库,基于云和HPC的基础设施(应用程序、存储和计算资源)以及互联网。尽管已经开发了许多方法和技术,但使用大数据进行机器学习仍然很困难。
    
Some [[Massive parallel processing|MPP]] relational databases have the ability to store and manage petabytes of data. Implicit is the ability to load, monitor, back up, and optimize the use of the large data tables in the [[RDBMS]].<ref>{{cite web |author=Monash, Curt |title=eBay's two enormous data warehouses |date=30 April 2009 |url=http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/}}<br />{{cite web |author=Monash, Curt |title=eBay followup&nbsp;– Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more |date=6 October 2010 |url =http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/}}</ref>{{promotional source|date=December 2018}}
 
Some [[Massive parallel processing|MPP]] relational databases have the ability to store and manage petabytes of data. Implicit is the ability to load, monitor, back up, and optimize the use of the large data tables in the [[RDBMS]].<ref>{{cite web |author=Monash, Curt |title=eBay's two enormous data warehouses |date=30 April 2009 |url=http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/}}<br />{{cite web |author=Monash, Curt |title=eBay followup&nbsp;– Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more |date=6 October 2010 |url =http://www.dbms2.com/2010/10/06/ebay-followup-greenplum-out-teradata-10-petabytes-hadoop-has-some-value-and-more/}}</ref>{{promotional source|date=December 2018}}
第303行: 第303行:  
一些 MPP 关系数据库具有存储和管理 pb 级数据的能力。隐式是加载、监视、备份和优化 RDBMS 中大型数据表的使用的能力。< br/>
 
一些 MPP 关系数据库具有存储和管理 pb 级数据的能力。隐式是加载、监视、备份和优化 RDBMS 中大型数据表的使用的能力。< br/>
   −
'''''【终译版】'''''一些MPP关系数据库能够存储和管理数PB的数据。隐式是加载、监视、备份和优化RDBMS中大型数据表使用的能力。
+
'''''【终译版】'''''一些MPP关系型数据库能够存储和管理PB级的数据。隐式是加载、监视、备份和优化RDBMS中大型数据表使用的能力。
    
[[DARPA]]'s [[Topological Data Analysis]] program seeks the fundamental structure of massive data sets and in 2008 the technology went public with the launch of a company called "Ayasdi".<ref>{{cite web|url=http://www.ayasdi.com/resources/|title=Resources on how Topological Data Analysis is used to analyze big data|publisher=Ayasdi}}</ref>{{thirdpartyinline|date=December 2018}}
 
[[DARPA]]'s [[Topological Data Analysis]] program seeks the fundamental structure of massive data sets and in 2008 the technology went public with the launch of a company called "Ayasdi".<ref>{{cite web|url=http://www.ayasdi.com/resources/|title=Resources on how Topological Data Analysis is used to analyze big data|publisher=Ayasdi}}</ref>{{thirdpartyinline|date=December 2018}}
第311行: 第311行:  
美国国防部高级研究计划局的拓扑数据分析计划寻找海量数据集的基本结构。2008年,随着一家名为“ Ayasdi”的公司的成立,这项技术公之于众。
 
美国国防部高级研究计划局的拓扑数据分析计划寻找海量数据集的基本结构。2008年,随着一家名为“ Ayasdi”的公司的成立,这项技术公之于众。
   −
'''''【终译版】'''''DARPA的拓扑数据分析项目寻求海量数据集的基本结构,2008年,这项技术随着一家名为“Ayasdi”的公司的成立而上市。
+
'''''【终译版】'''''DARPA的拓扑数据分析项目寻求海量数据集的基本结构,2008年,这项技术随着一家名为“Ayasdi”的公司的成立而公之于众。
    
The practitioners of big data analytics processes are generally hostile to slower shared storage,<ref>{{cite web |title=Storage area networks need not apply |author=CNET News |date=1 April 2011 |url=http://news.cnet.com/8301-21546_3-20049693-10253464.html}}</ref> preferring direct-attached storage ([[Direct-attached storage|DAS]]) in its various forms from solid state drive ([[SSD]]) to high capacity [[Serial ATA|SATA]] disk buried inside parallel processing nodes. The perception of shared storage architectures—[[storage area network]] (SAN) and [[network-attached storage]] (NAS)— is that they are relatively slow, complex, and expensive. These qualities are not consistent with big data analytics systems that thrive on system performance, commodity infrastructure, and low cost.
 
The practitioners of big data analytics processes are generally hostile to slower shared storage,<ref>{{cite web |title=Storage area networks need not apply |author=CNET News |date=1 April 2011 |url=http://news.cnet.com/8301-21546_3-20049693-10253464.html}}</ref> preferring direct-attached storage ([[Direct-attached storage|DAS]]) in its various forms from solid state drive ([[SSD]]) to high capacity [[Serial ATA|SATA]] disk buried inside parallel processing nodes. The perception of shared storage architectures—[[storage area network]] (SAN) and [[network-attached storage]] (NAS)— is that they are relatively slow, complex, and expensive. These qualities are not consistent with big data analytics systems that thrive on system performance, commodity infrastructure, and low cost.
第319行: 第319行:  
大数据分析处理的从业者通常不喜欢缓慢的共享存储,他们更喜欢各种形式的直接连接的存储设备,从固态硬盘(SSD)到埋藏在并行处理节点中的大容量 SATA 磁盘。对于共享存储架构ーー存储区域网络(SAN)和存储网络附加存储(NAS)ーー的看法是,它们相对缓慢、复杂和昂贵。这些特性与依赖于系统性能、商品基础设施和低成本的大数据分析系统不一致。
 
大数据分析处理的从业者通常不喜欢缓慢的共享存储,他们更喜欢各种形式的直接连接的存储设备,从固态硬盘(SSD)到埋藏在并行处理节点中的大容量 SATA 磁盘。对于共享存储架构ーー存储区域网络(SAN)和存储网络附加存储(NAS)ーー的看法是,它们相对缓慢、复杂和昂贵。这些特性与依赖于系统性能、商品基础设施和低成本的大数据分析系统不一致。
   −
'''''【终译版】'''''大数据分析流程的从业者通常不喜欢速度较慢的共享存储,他们更喜欢各种形式的直连存储(DAS),从固态驱动器(SSD)到埋入并行处理节点中的高容量SATA磁盘。共享存储体系结构存储区域网络(SAN)和网络连接存储(NAS)的概念是它们相对缓慢、复杂且昂贵。这些品质与依赖系统性能、商品基础设施和低成本的大数据分析系统不一致。
+
'''''【终译版】'''''大数据分析流程的从业者通常不喜欢速度较慢的共享存储,他们更喜欢各种形式的直连存储(DAS),从固态驱动器(SSD)到加入并行处理节点中的高容量SATA磁盘。共享存储体系结构存储区域网络(SAN)和网络连接存储(NAS)的概念是它们相对缓慢、复杂且昂贵。这些品质与依赖系统性能、商品基础设施和低成本的大数据分析系统不一致。
    
Real or near-real-time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in direct-attached memory or disk is good—data on memory or disk at the other end of an [[Fiber connector|FC]] [[Storage area network|SAN]] connection is not. The cost of an [[Storage area network|SAN]] at the scale needed for analytics applications is much higher than other storage techniques.
 
Real or near-real-time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in direct-attached memory or disk is good—data on memory or disk at the other end of an [[Fiber connector|FC]] [[Storage area network|SAN]] connection is not. The cost of an [[Storage area network|SAN]] at the scale needed for analytics applications is much higher than other storage techniques.
第327行: 第327行:  
实时或接近实时的信息传递是大数据分析的定义特征之一。因此,无论何时何地,只要有可能,就可以避免延迟。直接连接的存储器或磁盘中的数据是好的ーー FC SAN 连接另一端的存储器或磁盘上的数据是坏的。在分析应用程序所需的规模上,SAN 的成本要比其他存储技术高得多。
 
实时或接近实时的信息传递是大数据分析的定义特征之一。因此,无论何时何地,只要有可能,就可以避免延迟。直接连接的存储器或磁盘中的数据是好的ーー FC SAN 连接另一端的存储器或磁盘上的数据是坏的。在分析应用程序所需的规模上,SAN 的成本要比其他存储技术高得多。
   −
'''''【终译版】'''''实时或近实时信息交付是大数据分析的定义特征之一。因此,无论何时何地都可以避免延迟。直连内存或磁盘中的数据是良好的FC SAN连接另一端的内存或磁盘中的数据不是。分析应用程序所需规模的SAN的成本比其他存储技术高得多。
+
'''''【终译版】'''''实时或近实时信息交付是大数据分析的特征之一。因此,无论何时何地都可以避免延迟。直连内存或磁盘中的数据是良好的,而FC SAN连接另一端的内存或磁盘中的数据则不是。分析应用程序所需规模的SAN的成本比其他存储技术高得多。
    
==Applications==
 
==Applications==
第334行: 第334行:  
Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP, and Dell have spent more than $15 billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole.
 
Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP, and Dell have spent more than $15 billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole.
   −
大数据极大地增加了信息管理专家的需求,以至于 Software AG、甲骨文公司、 IBM、微软、 SAP、 EMC、惠普和戴尔已经在数据管理和分析软件公司上花费了超过150亿美元。在2010年,这个行业价值超过1000亿美元,并且以每年近10% 的速度增长: 大约是整个软件行业的两倍。'''''【终译版】'''''。大数据极大地增加了对信息管理专家的需求,以至于Software AG、Oracle Corporation、IBM、Microsoft、SAP、EMC、HP和Dell在专门从事数据管理和分析的软件公司上花费了150多亿美元。2010年,这个行业的价值超过1000亿美元,并以每年近10%的速度增长:大约是整个软件业务的两倍。
+
大数据极大地增加了信息管理专家的需求,以至于 Software AG、甲骨文公司、 IBM、微软、 SAP、 EMC、惠普和戴尔已经在数据管理和分析软件公司上花费了超过150亿美元。在2010年,这个行业价值超过1000亿美元,并且以每年近10% 的速度增长: 大约是整个软件行业的两倍。
 +
 
 +
'''''【终译版】'''''大数据极大地增加了对信息管理专家的需求,以至于Software AG、甲骨文、IBM、微软、SAP、EMC、惠普和戴尔在专门从事数据管理和分析的软件公司上花费了150多亿美元。2010年,这个行业的价值超过1000亿美元,并以每年近10%的速度增长:大约是整个软件行业的两倍。
    
Developed economies increasingly use data-intensive technologies. There are 4.6&nbsp;billion mobile-phone subscriptions worldwide, and between 1&nbsp;billion and 2&nbsp;billion people accessing the internet.{{r|Economist}} Between 1990 and 2005, more than 1&nbsp;billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth. The world's effective capacity to exchange information through telecommunication networks was 281 [[petabytes]] in 1986, 471 [[petabytes]] in 1993, 2.2 exabytes in 2000, 65 [[exabytes]] in 2007<ref name="martinhilbert.net"/> and predictions put the amount of internet traffic at 667 exabytes annually by 2014.{{r|Economist}} According to one estimate, one-third of the globally stored information is in the form of alphanumeric text and still image data,<ref name="HilbertContent">{{cite journal|title= What is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video?| doi= 10.1080/01972243.2013.873748 | volume=30| issue=2 |journal=The Information Society|pages=127–143|year = 2014|last1 = Hilbert|first1 = Martin| s2cid= 45759014 | url= https://escholarship.org/uc/item/87w5f6wb }}</ref> which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content).
 
Developed economies increasingly use data-intensive technologies. There are 4.6&nbsp;billion mobile-phone subscriptions worldwide, and between 1&nbsp;billion and 2&nbsp;billion people accessing the internet.{{r|Economist}} Between 1990 and 2005, more than 1&nbsp;billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth. The world's effective capacity to exchange information through telecommunication networks was 281 [[petabytes]] in 1986, 471 [[petabytes]] in 1993, 2.2 exabytes in 2000, 65 [[exabytes]] in 2007<ref name="martinhilbert.net"/> and predictions put the amount of internet traffic at 667 exabytes annually by 2014.{{r|Economist}} According to one estimate, one-third of the globally stored information is in the form of alphanumeric text and still image data,<ref name="HilbertContent">{{cite journal|title= What is the Content of the World's Technologically Mediated Information and Communication Capacity: How Much Text, Image, Audio, and Video?| doi= 10.1080/01972243.2013.873748 | volume=30| issue=2 |journal=The Information Society|pages=127–143|year = 2014|last1 = Hilbert|first1 = Martin| s2cid= 45759014 | url= https://escholarship.org/uc/item/87w5f6wb }}</ref> which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content).
第340行: 第342行:  
Developed economies increasingly use data-intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide, and between 1 billion and 2 billion people accessing the internet. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth. The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 and predictions put the amount of internet traffic at 667 exabytes annually by 2014. According to one estimate, one-third of the globally stored information is in the form of alphanumeric text and still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content).
 
Developed economies increasingly use data-intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide, and between 1 billion and 2 billion people accessing the internet. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means more people became more literate, which in turn led to information growth. The world's effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007 and predictions put the amount of internet traffic at 667 exabytes annually by 2014. According to one estimate, one-third of the globally stored information is in the form of alphanumeric text and still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content).
   −
发达经济体越来越多地使用数据密集型技术。全世界有46亿移动电话用户,10亿到20亿人使用互联网。从1990年到2005年,全世界有超过10亿人进入中产阶级,这意味着更多的人变得更有文化,进而导致信息增长。世界通过电信网络交换信息的有效容量在1986年为281千兆字节,1993年为471千兆字节,2000年为2.2千兆字节,2007年为65千兆字节,预计到2014年每年的互联网流量将达到667千兆字节。据估计,全球储存的信息有三分之一是字母数字文本和静止图像数据,这是大多数大数据应用程序最有用的格式。这也显示了尚未使用的数据的潜力(即。以视频和音频内容的形式)。'''''【终译版】'''''。发达经济体越来越多地使用数据密集型技术。全世界有46亿手机用户,有10亿到20亿人上网。从1990年到2005年,全世界有超过10亿人进入中产阶级,这意味着更多的人变得更识字,这反过来又导致了信息的增长。1986年,世界通过电信网络交换信息的有效容量为281 PB,1993年为471 PB,2000年为2.2 EB,2007年为65 EB。据预测,到2014年,互联网流量将达到每年667 EB。据估计,全球存储信息的三分之一是字母数字文本和静态图像数据,这是大多数大数据应用最有用的格式。这也显示了尚未使用的数据(即以视频和音频内容的形式)的潜力。
+
发达经济体越来越多地使用数据密集型技术。全世界有46亿移动电话用户,10亿到20亿人使用互联网。从1990年到2005年,全世界有超过10亿人进入中产阶级,这意味着更多的人变得更有文化,进而导致信息增长。世界通过电信网络交换信息的有效容量在1986年为281千兆字节,1993年为471千兆字节,2000年为2.2千兆字节,2007年为65千兆字节,预计到2014年每年的互联网流量将达到667千兆字节。据估计,全球储存的信息有三分之一是字母数字文本和静止图像数据,这是大多数大数据应用程序最有用的格式。这也显示了尚未使用的数据的潜力(即。以视频和音频内容的形式)。
 +
 
 +
'''''【终译版】'''''发达经济体越来越多地使用数据密集型技术。全世界有46亿手机用户,有10亿到20亿人上网。从1990年到2005年,全世界有超过10亿人进入中产阶级,这意味着更多的人变得更有文化,进而导致了信息的增长。1986年,世界通过电信网络交换信息的有效容量为281 PB,1993年为471 PB,2000年为2.2 EB,2007年为65 EB。据预测,到2014年,互联网流量将达到每年667 EB。据估计,全球存储信息的三分之一是字母数字文本和静态图像数据,这是大多数大数据应用最有用的格式。这也显示了尚未使用的(以视频和音频内容的形式)数据的潜力。
    
While many vendors offer off-the-shelf products for big data, experts promote the development of in-house custom-tailored systems if the company has sufficient technical capabilities.<ref>{{cite web |url=http://www.kdnuggets.com/2014/07/interview-amy-gershkoff-ebay-in-house-BI-tools.html |title=Interview: Amy Gershkoff, Director of Customer Analytics & Insights, eBay on How to Design Custom In-House BI Tools |last1=Rajpurohit |first1=Anmol |date=11 July 2014 |website= KDnuggets|access-date=14 July 2014|quote=Generally, I find that off-the-shelf business intelligence tools do not meet the needs of clients who want to derive custom insights from their data. Therefore, for medium-to-large organizations with access to strong technical talent, I usually recommend building custom, in-house solutions.}}</ref>
 
While many vendors offer off-the-shelf products for big data, experts promote the development of in-house custom-tailored systems if the company has sufficient technical capabilities.<ref>{{cite web |url=http://www.kdnuggets.com/2014/07/interview-amy-gershkoff-ebay-in-house-BI-tools.html |title=Interview: Amy Gershkoff, Director of Customer Analytics & Insights, eBay on How to Design Custom In-House BI Tools |last1=Rajpurohit |first1=Anmol |date=11 July 2014 |website= KDnuggets|access-date=14 July 2014|quote=Generally, I find that off-the-shelf business intelligence tools do not meet the needs of clients who want to derive custom insights from their data. Therefore, for medium-to-large organizations with access to strong technical talent, I usually recommend building custom, in-house solutions.}}</ref>
第346行: 第350行:  
While many vendors offer off-the-shelf products for big data, experts promote the development of in-house custom-tailored systems if the company has sufficient technical capabilities.
 
While many vendors offer off-the-shelf products for big data, experts promote the development of in-house custom-tailored systems if the company has sufficient technical capabilities.
   −
虽然许多供应商提供现成的大数据产品,但如果公司拥有足够的技术能力,专家则推动开发内部定制系统。'''''【终译版】'''''。虽然许多供应商为大数据提供现成的产品,但如果公司有足够的技术能力,专家会推动内部定制系统的开发。
+
虽然许多供应商提供现成的大数据产品,但如果公司拥有足够的技术能力,专家则推动开发内部定制系统。
 +
 
 +
'''''【终译版】'''''虽然许多供应商为大数据提供现成的产品,但如果公司有足够的技术能力,专家则会开发内部定制系统。
 
===Government===
 
===Government===
 +
 +
=== 政府 ===
 
The use and adoption of big data within governmental processes allows efficiencies in terms of cost, productivity, and innovation,<ref>{{cite magazine|url =http://www.computerworld.com/article/2472667/government-it/the-government-and-big-data--use--problems-and-potential.html |title=The Government and big data: Use, problems and potential |date=21 March 2012 |magazine=[[Computerworld]] |access-date=12 September 2016}}</ref> but does not come without its flaws. Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome. A common government organization that makes use of big data is the National Security Administration ([[National Security Agency|NSA]]), which monitors the activities of the Internet constantly in search for potential patterns of suspicious or illegal activities their system may pick up.
 
The use and adoption of big data within governmental processes allows efficiencies in terms of cost, productivity, and innovation,<ref>{{cite magazine|url =http://www.computerworld.com/article/2472667/government-it/the-government-and-big-data--use--problems-and-potential.html |title=The Government and big data: Use, problems and potential |date=21 March 2012 |magazine=[[Computerworld]] |access-date=12 September 2016}}</ref> but does not come without its flaws. Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome. A common government organization that makes use of big data is the National Security Administration ([[National Security Agency|NSA]]), which monitors the activities of the Internet constantly in search for potential patterns of suspicious or illegal activities their system may pick up.
   第354行: 第362行:  
在政府流程中使用和采用大数据可以在成本、生产力和创新方面提高效率,但也存在缺陷。数据分析往往需要多个政府部门(中央和地方)协同工作,创建新的创新流程,以实现预期成果。利用大数据的一个常见政府组织是国家安全局,该局不断监测互联网的活动,以搜索其系统可能发现的可疑或非法活动的潜在模式。
 
在政府流程中使用和采用大数据可以在成本、生产力和创新方面提高效率,但也存在缺陷。数据分析往往需要多个政府部门(中央和地方)协同工作,创建新的创新流程,以实现预期成果。利用大数据的一个常见政府组织是国家安全局,该局不断监测互联网的活动,以搜索其系统可能发现的可疑或非法活动的潜在模式。
   −
'''''【终译版】'''''。在政府流程中使用和采用大数据可以提高成本、生产率和创新效率,但也并非没有缺陷。数据分析通常需要政府的多个部门(中央和地方)合作,创建新的创新流程,以实现预期结果。国家安全局(NSA)是一个利用大数据的常见政府组织,它不断监控互联网的活动,寻找其系统可能发现的可疑或非法活动的潜在模式。
+
'''''【终译版】'''''在政府流程中应用大数据可以提高成本、生产率和创新效率,但也并非没有缺陷。数据分析通常需要政府的多个部门(中央和地方)合作,创建新的流程以实现预期结果。国家安全局(NSA)是一个利用大数据的常见政府组织,它不断监控互联网的活动,寻找其系统可能发现的可疑或非法活动的潜在模式。
    
[[Civil registration and vital statistics]] (CRVS) collects all certificates status from birth to death. CRVS is a source of big data for governments.
 
[[Civil registration and vital statistics]] (CRVS) collects all certificates status from birth to death. CRVS is a source of big data for governments.
第362行: 第370行:  
民事登记和人口动态统计收集从出生到死亡的所有证明状态。民事登记和人口动态统计系统是政府大数据的一个来源。
 
民事登记和人口动态统计收集从出生到死亡的所有证明状态。民事登记和人口动态统计系统是政府大数据的一个来源。
   −
'''''【终译版】'''''。民事登记和人口动态统计(CRVS)收集从出生到死亡的所有身份证明。CRV是政府大数据的来源。
+
'''''【终译版】'''''民事登记和人口动态统计局(CRVS)收集从出生到死亡的所有身份证明。CRV是政府大数据的来源。
    
===International development===
 
===International development===
 +
 +
=== 国际发展 ===
 
Research on the effective usage of information and communication technologies for development (also known as "ICT4D") suggests that big data technology can make important contributions but also present unique challenges to [[international development]].<ref>{{cite web| url=http://www.unglobalpulse.org/projects/BigDataforDevelopment |title=White Paper: Big Data for Development: Opportunities & Challenges (2012) – United Nations Global Pulse| website=Unglobalpulse.org |access-date=13 April 2016}}</ref><ref>{{cite web| title=WEF (World Economic Forum), & Vital Wave Consulting. (2012). Big Data, Big Impact: New Possibilities for International Development|work= World Economic Forum|access-date=24 August 2012| url= http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development}}</ref> Advancements in big data analysis offer cost-effective opportunities to improve decision-making in critical development areas such as health care, employment, [[economic productivity]], crime, security, and [[natural disaster]] and resource management.<ref name="HilbertBigData2013" /><ref>{{cite web|url=http://blogs.worldbank.org/ic4d/four-ways-to-talk-about-big-data/|title=Elena Kvochko, Four Ways To talk About Big Data (Information Communication Technologies for Development Series)|publisher=worldbank.org|access-date=30 May 2012|date=4 December 2012}}</ref><ref>{{cite web| title=Daniele Medri: Big Data & Business: An on-going revolution| url=http://www.statisticsviews.com/details/feature/5393251/Big-Data--Business-An-on-going-revolution.html| publisher=Statistics Views| date=21 October 2013| access-date=21 June 2015| archive-date=17 June 2015| archive-url=https://web.archive.org/web/20150617211645/http://www.statisticsviews.com/details/feature/5393251/Big-Data--Business-An-on-going-revolution.html| url-status=dead}}</ref> Additionally, user-generated data offers new opportunities to give the unheard a voice.<ref>{{cite web|title=Responsible use of data|author=Tobias Knobloch and Julia Manske|work= D+C, Development and Cooperation|date=11 January 2016|url= http://www.dandc.eu/en/article/opportunities-and-risks-user-generated-and-automatically-compiled-data}}</ref> However, longstanding challenges for developing regions such as inadequate technological infrastructure and economic and human resource scarcity exacerbate existing concerns with big data such as privacy, imperfect methodology, and interoperability issues.<ref name="HilbertBigData2013" /> The challenge of "big data for development"<ref name="HilbertBigData2013" /> is currently evolving toward the application of this data through machine learning, known as "artificial intelligence for development (AI4D).<ref>Mann, S., & Hilbert, M. (2020). AI4D: Artificial Intelligence for Development. International Journal of Communication, 14(0), 21. https://www.martinhilbert.net/ai4d-artificial-intelligence-for-development/</ref>
 
Research on the effective usage of information and communication technologies for development (also known as "ICT4D") suggests that big data technology can make important contributions but also present unique challenges to [[international development]].<ref>{{cite web| url=http://www.unglobalpulse.org/projects/BigDataforDevelopment |title=White Paper: Big Data for Development: Opportunities & Challenges (2012) – United Nations Global Pulse| website=Unglobalpulse.org |access-date=13 April 2016}}</ref><ref>{{cite web| title=WEF (World Economic Forum), & Vital Wave Consulting. (2012). Big Data, Big Impact: New Possibilities for International Development|work= World Economic Forum|access-date=24 August 2012| url= http://www.weforum.org/reports/big-data-big-impact-new-possibilities-international-development}}</ref> Advancements in big data analysis offer cost-effective opportunities to improve decision-making in critical development areas such as health care, employment, [[economic productivity]], crime, security, and [[natural disaster]] and resource management.<ref name="HilbertBigData2013" /><ref>{{cite web|url=http://blogs.worldbank.org/ic4d/four-ways-to-talk-about-big-data/|title=Elena Kvochko, Four Ways To talk About Big Data (Information Communication Technologies for Development Series)|publisher=worldbank.org|access-date=30 May 2012|date=4 December 2012}}</ref><ref>{{cite web| title=Daniele Medri: Big Data & Business: An on-going revolution| url=http://www.statisticsviews.com/details/feature/5393251/Big-Data--Business-An-on-going-revolution.html| publisher=Statistics Views| date=21 October 2013| access-date=21 June 2015| archive-date=17 June 2015| archive-url=https://web.archive.org/web/20150617211645/http://www.statisticsviews.com/details/feature/5393251/Big-Data--Business-An-on-going-revolution.html| url-status=dead}}</ref> Additionally, user-generated data offers new opportunities to give the unheard a voice.<ref>{{cite web|title=Responsible use of data|author=Tobias Knobloch and Julia Manske|work= D+C, Development and Cooperation|date=11 January 2016|url= http://www.dandc.eu/en/article/opportunities-and-risks-user-generated-and-automatically-compiled-data}}</ref> However, longstanding challenges for developing regions such as inadequate technological infrastructure and economic and human resource scarcity exacerbate existing concerns with big data such as privacy, imperfect methodology, and interoperability issues.<ref name="HilbertBigData2013" /> The challenge of "big data for development"<ref name="HilbertBigData2013" /> is currently evolving toward the application of this data through machine learning, known as "artificial intelligence for development (AI4D).<ref>Mann, S., & Hilbert, M. (2020). AI4D: Artificial Intelligence for Development. International Journal of Communication, 14(0), 21. https://www.martinhilbert.net/ai4d-artificial-intelligence-for-development/</ref>
   第371行: 第381行:  
= = = 国际发展 = = 关于有效利用信息和通信技术促进发展的研究(又称“ ICT4D”)表明,大数据技术可以作出重要贡献,但也对国际发展提出独特的挑战。海量数据分析的进步为改善关键发展领域的决策提供了成本效益高的机会,这些领域包括保健、就业、经济生产力、犯罪、安全、自然灾害和资源管理。此外,用户生成的数据提供了新的机会,给未听到的声音。然而,发展中地区面临的长期挑战,如技术基础设施不足、经济和人力资源稀缺,加剧了人们对大数据的现有担忧,如隐私、方法不完善以及互操作性问题。“大数据促进发展”的挑战目前正朝着通过机器学习(被称为“人工智能促进发展(AI4D)”)应用这些数据的方向发展。希尔伯特 · 曼(2020)。AI4D: 人工智能促进发展。国际通信杂志,14(0) ,21.  https://www.martinhilbert.net/ai4d-artificial-intelligence-for-development/
 
= = = 国际发展 = = 关于有效利用信息和通信技术促进发展的研究(又称“ ICT4D”)表明,大数据技术可以作出重要贡献,但也对国际发展提出独特的挑战。海量数据分析的进步为改善关键发展领域的决策提供了成本效益高的机会,这些领域包括保健、就业、经济生产力、犯罪、安全、自然灾害和资源管理。此外,用户生成的数据提供了新的机会,给未听到的声音。然而,发展中地区面临的长期挑战,如技术基础设施不足、经济和人力资源稀缺,加剧了人们对大数据的现有担忧,如隐私、方法不完善以及互操作性问题。“大数据促进发展”的挑战目前正朝着通过机器学习(被称为“人工智能促进发展(AI4D)”)应用这些数据的方向发展。希尔伯特 · 曼(2020)。AI4D: 人工智能促进发展。国际通信杂志,14(0) ,21.  https://www.martinhilbert.net/ai4d-artificial-intelligence-for-development/
   −
'''''【终译版】'''''。关于有效利用信息和通信技术促进发展(也称为“ICT4D”)的研究表明,大数据技术可以做出重要贡献,但也对国际发展提出了独特的挑战。大数据分析的进步为改善关键发展领域的决策提供了成本效益高的机会,如医疗保健、就业、经济生产率、犯罪、安全、自然灾害和资源管理。此外,用户生成的数据提供了新的机会,让闻所未闻的声音。然而,发展中地区面临的长期挑战,如技术基础设施不足、经济和人力资源匮乏,加剧了对大数据的现有担忧,如隐私、不完善的方法和互操作性问题。“大数据促进发展”的挑战目前正朝着通过机器学习应用这些数据的方向发展,称为“人工智能促进发展”(AI4D)。Mann,S.,和Hilbert,M.(2020)。AI4D:人工智能促进发展。国际通讯杂志,14(0),21。<nowiki>https://www.martinhilbert.net/ai4d-artificial-intelligence-for-development/</nowiki>
+
'''''【终译版】'''''关于有效利用信息和通信技术促进发展(也称为“ICT4D”)的研究表明,大数据技术可以做出重要贡献,但也对国际发展提出了独特的挑战。大数据分析的进步为改善关键发展领域的决策提供了高成本效益的机会,如医疗保健、就业、经济生产率、犯罪、安全、自然灾害和资源管理。此外,用户生成的数据提供了闻所未闻的新机会。然而,发展中地区面临的长期挑战,如技术基础设施不足、经济和人力资源匮乏,加剧了对大数据的现有担忧,如隐私、不完善的方法和互操作性问题。“大数据促进发展”的挑战目前正朝着通过机器学习的方向发展,称为“人工智能促进发展”(AI4D)。
    
====Benefits====
 
====Benefits====
 +
 +
=== 效益 ===
 
A major practical application of big data for development has been "fighting poverty with data".<ref>Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353(6301), 753–754. https://doi.org/10.1126/science.aah5217</ref> In 2015, Blumenstock and colleagues estimated predicted poverty and wealth from mobile phone metadata <ref>Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420</ref> and in 2016 Jean and colleagues combined satellite imagery and machine learning to predict poverty.<ref>Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894</ref> Using digital trace data to study the labor market and the digital economy in Latin America, Hilbert and colleagues <ref name="HilbertJobMarket">Hilbert, M., & Lu, K. (2020). The online job market trace in Latin America and the Caribbean (UN ECLAC LC/TS.2020/83; p. 79). United Nations Economic Commission for Latin America and the Caribbean. https://www.cepal.org/en/publications/45892-online-job-market-trace-latin-america-and-caribbean</ref><ref>UN ECLAC, (United Nations Economic Commission for Latin America and the Caribbean). (2020). Tracking the digital footprint in Latin America and the Caribbean: Lessons learned from using big data to assess the digital economy (Productive Development, Gender Affairs LC/TS.2020/12; Documentos de Proyecto). United Nations ECLAC. https://repositorio.cepal.org/handle/11362/45484</ref> argue that digital trace data has several benefits such as:
 
A major practical application of big data for development has been "fighting poverty with data".<ref>Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353(6301), 753–754. https://doi.org/10.1126/science.aah5217</ref> In 2015, Blumenstock and colleagues estimated predicted poverty and wealth from mobile phone metadata <ref>Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420</ref> and in 2016 Jean and colleagues combined satellite imagery and machine learning to predict poverty.<ref>Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894</ref> Using digital trace data to study the labor market and the digital economy in Latin America, Hilbert and colleagues <ref name="HilbertJobMarket">Hilbert, M., & Lu, K. (2020). The online job market trace in Latin America and the Caribbean (UN ECLAC LC/TS.2020/83; p. 79). United Nations Economic Commission for Latin America and the Caribbean. https://www.cepal.org/en/publications/45892-online-job-market-trace-latin-america-and-caribbean</ref><ref>UN ECLAC, (United Nations Economic Commission for Latin America and the Caribbean). (2020). Tracking the digital footprint in Latin America and the Caribbean: Lessons learned from using big data to assess the digital economy (Productive Development, Gender Affairs LC/TS.2020/12; Documentos de Proyecto). United Nations ECLAC. https://repositorio.cepal.org/handle/11362/45484</ref> argue that digital trace data has several benefits such as:
 
* Thematic coverage: including areas that were previously difficult or impossible to measure
 
* Thematic coverage: including areas that were previously difficult or impossible to measure
第396行: 第408行:  
大数据促进发展的一个主要实际应用是“用数据战胜贫困”。2015年,Blumenstock及其同事通过手机元数据预测贫困和财富,2016年Jean及其同事结合卫星图像和机器学习预测贫困。Hilbert及其同事利用数字跟踪数据研究拉丁美洲的劳动力市场和数字经济,认为数字跟踪数据有以下几个好处:
 
大数据促进发展的一个主要实际应用是“用数据战胜贫困”。2015年,Blumenstock及其同事通过手机元数据预测贫困和财富,2016年Jean及其同事结合卫星图像和机器学习预测贫困。Hilbert及其同事利用数字跟踪数据研究拉丁美洲的劳动力市场和数字经济,认为数字跟踪数据有以下几个好处:
   −
专题报道:包括以前难以或无法衡量的领域
+
* 领域覆盖范围:包括以前难以或无法衡量的领域。
 +
* 地理覆盖范围:我们的国际来源提供了几乎所有国家的大量可比数据,包括许多通常不包括在国际清单中的小国。
 +
* 详细程度:提供具有许多相关变量和新方面(如网络连接)的细粒度数据。
 +
* 及时性:图表可以在收集后的几天内生成。
   −
地理覆盖范围:我们的国际来源提供了几乎所有国家的大量可比数据,包括许多通常不包括在国际清单中的小国
+
====Challenges====
 
  −
详细级别:提供具有许多相关变量和新方面(如网络连接)的细粒度数据
  −
 
  −
及时性和时间序列:图表可以在收集后的几天内生成
     −
====Challenges====
+
=== 挑战 ===
 
At the same time, working with digital trace data instead of traditional survey data does not eliminate the traditional challenges involved when working in the field of international quantitative analysis. Priorities change, but the basic discussions remain the same. Among the main challenges are:
 
At the same time, working with digital trace data instead of traditional survey data does not eliminate the traditional challenges involved when working in the field of international quantitative analysis. Priorities change, but the basic discussions remain the same. Among the main challenges are:
 
* Representativeness. While traditional development statistics is mainly concerned with the representativeness of random survey samples, digital trace data is never a random sample.<ref>{{Cite journal|last1=Banerjee|first1=Amitav|last2=Chaudhury|first2=Suprakash|date=2010|title=Statistics without tears: Populations and samples|journal=Industrial Psychiatry Journal|volume=19|issue=1|pages=60–65|doi=10.4103/0972-6748.77642|issn=0972-6748|pmc=3105563|pmid=21694795}}</ref>  
 
* Representativeness. While traditional development statistics is mainly concerned with the representativeness of random survey samples, digital trace data is never a random sample.<ref>{{Cite journal|last1=Banerjee|first1=Amitav|last2=Chaudhury|first2=Suprakash|date=2010|title=Statistics without tears: Populations and samples|journal=Industrial Psychiatry Journal|volume=19|issue=1|pages=60–65|doi=10.4103/0972-6748.77642|issn=0972-6748|pmc=3105563|pmid=21694795}}</ref>  
第425行: 第436行:  
'''''【终译版】'''''
 
'''''【终译版】'''''
   −
同时,使用数字跟踪数据而不是传统调查数据并不能消除在国际定量分析领域工作时所面临的传统挑战。优先事项会发生变化,但基本的讨论仍然是一样的。主要挑战包括:
+
与此同时,使用数字痕迹数据而不是传统调查数据并不能消除在国际定量分析领域工作时所面临的传统挑战。优先顺序改变了,但是基本的挑战仍然没有改变。主要挑战包括:
   −
代表性。虽然传统的发展统计主要关注随机调查样本的代表性,但数字跟踪数据绝不是随机样本。
+
* 代表性。虽然传统的发展统计主要关注随机调查样本的代表性,但数字跟踪数据绝不是随机样本。
 +
* 普遍性。虽然观测数据总是很好地代表了这个来源,但它只代表了它所代表的东西。虽然从一个平台的具体观察概括到更广泛的环境是很有诱惑力的,但这通常非常具有欺骗性。
 +
* 整合协调。数字跟踪数据仍然需要指标的国际间整合协调。它增加了“数据融合”的挑战,即不同来源的整合协调。
 +
* 数据过载。分析师和机构不习惯有效地处理大量变量,这是通过交互式仪表盘能有效地完成。从业仍者然缺乏一个标准的工作流程,使研究人员、用户和决策者能够高效、高效地执行任务。
   −
普遍性。虽然观测数据总是很好地代表了这个来源,但它只代表了它所代表的东西,仅此而已。虽然从一个平台的具体观察概括到更广泛的环境是很有诱惑力的,但这通常是非常具有欺骗性的。
+
===Healthcare===
   −
协调。数字跟踪数据仍然需要国际指标的协调。它增加了所谓“数据融合”的挑战,即不同来源的协调。
+
=== 医疗 ===
 
  −
数据过载。分析师和机构不习惯有效地处理大量变量,这是通过交互式仪表盘有效地完成的。从业者仍然缺乏一个标准的工作流程,使研究人员、用户和决策者能够高效、高效地执行任务。
  −
 
  −
===Healthcare===
   
Big data analytics was used in healthcare by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal reporting of patient data, standardized medical terms and patient registries.<ref name="ref135">{{cite journal | vauthors = Huser V, Cimino JJ | title = Impending Challenges for the Use of Big Data | journal = International Journal of Radiation Oncology, Biology, Physics | volume = 95 | issue = 3 | pages = 890–894 | date = July 2016 | pmid = 26797535 | pmc = 4860172 | doi = 10.1016/j.ijrobp.2015.10.060 }}</ref><ref>{{Cite book|title=Signal Processing and Machine Learning for Biomedical Big Data.|others=Sejdić, Ervin, Falk, Tiago H.|isbn=9781351061216|location=[Place of publication not identified]|oclc=1044733829|last1 = Sejdic|first1 = Ervin|last2 = Falk|first2 = Tiago H.|date = 4 July 2018}}</ref><ref>{{cite journal | vauthors = Raghupathi W, Raghupathi V | title = Big data analytics in healthcare: promise and potential | journal = Health Information Science and Systems | volume = 2 | issue = 1 | pages = 3 | date = December 2014 | pmid = 25825667 | pmc = 4341817 | doi = 10.1186/2047-2501-2-3 }}</ref><ref>{{cite journal | vauthors = Viceconti M, Hunter P, Hose R | title = Big data, big knowledge: big data for personalized healthcare | journal = IEEE Journal of Biomedical and Health Informatics | volume = 19 | issue = 4 | pages = 1209–15 | date = July 2015 | pmid = 26218867 | doi = 10.1109/JBHI.2015.2406883 | s2cid = 14710821 | url = http://eprints.whiterose.ac.uk/89104/1/pap%20JBHI%20BigData%20in%20VPH%20revision%20v2.pdf | doi-access = free }}</ref> Some areas of improvement are more aspirational than actually implemented. The level of data generated within [[Health system|healthcare systems]] is not trivial. With the added adoption of mHealth, eHealth and wearable technologies the volume of data will continue to increase. This includes [[electronic health record]] data, imaging data, patient generated data, sensor data, and other forms of difficult to process data. There is now an even greater need for such environments to pay greater attention to data and information quality.<ref>{{cite journal|title=Data Management Within mHealth Environments: Patient Sensors, Mobile Devices, and Databases |first1=John| last1=O'Donoghue |first2=John|last2=Herbert|s2cid=2318649|date=1 October 2012|volume=4|issue=1|pages=5:1–5:20| doi=10.1145/2378016.2378021 |journal=Journal of Data and Information Quality}}</ref> "Big data very often means '[[dirty data]]' and the fraction of data inaccuracies increases with data volume growth." Human inspection at the big data scale is impossible and there is a desperate need in health service for intelligent tools for accuracy and believability control and handling of information missed.<ref name="Mirkes2016">{{cite journal | vauthors = Mirkes EM, Coats TJ, Levesley J, Gorban AN | title = Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes | journal = Computers in Biology and Medicine | volume = 75 | pages = 203–16 | date = August 2016 | pmid = 27318570 | doi = 10.1016/j.compbiomed.2016.06.004 | arxiv = 1604.00627 | bibcode = 2016arXiv160400627M | s2cid = 5874067 }}</ref> While extensive information in healthcare is now electronic, it fits under the big data umbrella as most is unstructured and difficult to use.<ref>{{cite journal | vauthors = Murdoch TB, Detsky AS | title = The inevitable application of big data to health care | journal = JAMA | volume = 309 | issue = 13 | pages = 1351–2 | date = April 2013 | pmid = 23549579 | doi = 10.1001/jama.2013.393 }}</ref> The use of big data in healthcare has raised significant ethical challenges ranging from risks for individual rights, privacy and [[autonomy]], to transparency and trust.<ref>{{cite journal | vauthors = Vayena E, Salathé M, Madoff LC, Brownstein JS | title = Ethical challenges of big data in public health | journal = PLOS Computational Biology | volume = 11 | issue = 2 | pages = e1003904 | date = February 2015 | pmid = 25664461 | pmc = 4321985 | doi = 10.1371/journal.pcbi.1003904 | bibcode = 2015PLSCB..11E3904V }}</ref>
 
Big data analytics was used in healthcare by providing personalized medicine and prescriptive analytics, clinical risk intervention and predictive analytics, waste and care variability reduction, automated external and internal reporting of patient data, standardized medical terms and patient registries.<ref name="ref135">{{cite journal | vauthors = Huser V, Cimino JJ | title = Impending Challenges for the Use of Big Data | journal = International Journal of Radiation Oncology, Biology, Physics | volume = 95 | issue = 3 | pages = 890–894 | date = July 2016 | pmid = 26797535 | pmc = 4860172 | doi = 10.1016/j.ijrobp.2015.10.060 }}</ref><ref>{{Cite book|title=Signal Processing and Machine Learning for Biomedical Big Data.|others=Sejdić, Ervin, Falk, Tiago H.|isbn=9781351061216|location=[Place of publication not identified]|oclc=1044733829|last1 = Sejdic|first1 = Ervin|last2 = Falk|first2 = Tiago H.|date = 4 July 2018}}</ref><ref>{{cite journal | vauthors = Raghupathi W, Raghupathi V | title = Big data analytics in healthcare: promise and potential | journal = Health Information Science and Systems | volume = 2 | issue = 1 | pages = 3 | date = December 2014 | pmid = 25825667 | pmc = 4341817 | doi = 10.1186/2047-2501-2-3 }}</ref><ref>{{cite journal | vauthors = Viceconti M, Hunter P, Hose R | title = Big data, big knowledge: big data for personalized healthcare | journal = IEEE Journal of Biomedical and Health Informatics | volume = 19 | issue = 4 | pages = 1209–15 | date = July 2015 | pmid = 26218867 | doi = 10.1109/JBHI.2015.2406883 | s2cid = 14710821 | url = http://eprints.whiterose.ac.uk/89104/1/pap%20JBHI%20BigData%20in%20VPH%20revision%20v2.pdf | doi-access = free }}</ref> Some areas of improvement are more aspirational than actually implemented. The level of data generated within [[Health system|healthcare systems]] is not trivial. With the added adoption of mHealth, eHealth and wearable technologies the volume of data will continue to increase. This includes [[electronic health record]] data, imaging data, patient generated data, sensor data, and other forms of difficult to process data. There is now an even greater need for such environments to pay greater attention to data and information quality.<ref>{{cite journal|title=Data Management Within mHealth Environments: Patient Sensors, Mobile Devices, and Databases |first1=John| last1=O'Donoghue |first2=John|last2=Herbert|s2cid=2318649|date=1 October 2012|volume=4|issue=1|pages=5:1–5:20| doi=10.1145/2378016.2378021 |journal=Journal of Data and Information Quality}}</ref> "Big data very often means '[[dirty data]]' and the fraction of data inaccuracies increases with data volume growth." Human inspection at the big data scale is impossible and there is a desperate need in health service for intelligent tools for accuracy and believability control and handling of information missed.<ref name="Mirkes2016">{{cite journal | vauthors = Mirkes EM, Coats TJ, Levesley J, Gorban AN | title = Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes | journal = Computers in Biology and Medicine | volume = 75 | pages = 203–16 | date = August 2016 | pmid = 27318570 | doi = 10.1016/j.compbiomed.2016.06.004 | arxiv = 1604.00627 | bibcode = 2016arXiv160400627M | s2cid = 5874067 }}</ref> While extensive information in healthcare is now electronic, it fits under the big data umbrella as most is unstructured and difficult to use.<ref>{{cite journal | vauthors = Murdoch TB, Detsky AS | title = The inevitable application of big data to health care | journal = JAMA | volume = 309 | issue = 13 | pages = 1351–2 | date = April 2013 | pmid = 23549579 | doi = 10.1001/jama.2013.393 }}</ref> The use of big data in healthcare has raised significant ethical challenges ranging from risks for individual rights, privacy and [[autonomy]], to transparency and trust.<ref>{{cite journal | vauthors = Vayena E, Salathé M, Madoff LC, Brownstein JS | title = Ethical challenges of big data in public health | journal = PLOS Computational Biology | volume = 11 | issue = 2 | pages = e1003904 | date = February 2015 | pmid = 25664461 | pmc = 4321985 | doi = 10.1371/journal.pcbi.1003904 | bibcode = 2015PLSCB..11E3904V }}</ref>
  
35

个编辑