数据科学

趣木木讨论 | 贡献2020年5月9日 (六) 13:08的版本

此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。

  • 词条预计填充内容

1.foundations 背景(了解的一些基础知识);

2.术语内涵衍变(该术语如何产生及目前为止用法的一些不同);

3.数据科学的研究内容

3.1数据科学基础理论

3.2 数据预处理

3.3数据计算

3.4数据管理

4.在数据科学方面的职业和工作;

5.数据科学的影响;

6.数据科学中所涉及的一些技术和应用软件;

7.数据科学、人工智能、机器学习之间的差别

找到两篇博文供参考https://blog.csdn.net/fengdu78/article/details/105154546 https://blog.csdn.net/dev_csdn/article/details/79127658

8.与统计学的关系


其中,第2部分是需要搜集补充的内容,第7部分有一些参考资料(后续还会再找一些),第8部分可进行补充。

有英文翻译的部分引言、1,2,4,5,6,8,没有英文内容3、7

  • 任务分配

任务一:引言,1背景、2术语内涵、3研究内容【负责人: 】 其中背景部分文字需要进行翻译;引言、术语内涵已有参考资料和初期的人工翻译文本,研究内容需要找到资料进行填充;

任务二:4相关职业、5数据科学的影响【负责人: 】 其中并没有初期的人工翻译文本,可进一步搜集资料,使其更加完善;

任务三:6相关应用软件、7与机器学习人工智能的差别、8与统计学的关系【负责人: 】 其中7、8需要搜集资料进行填充,8已有参考资料和初期的人工翻译文本;


  • 附言
  1. 任务完成上交为5月10号下午六点前
  2. 有些部分的内容过少,需要大家自行斟酌进行一下填充
  3. 任务领取方式 在对应的任务一二三后附上自己的名字或者昵称
  4. 大家有相关的参考资料也可以共享出来,并发给趣木木以便后期编者推荐时挑选进行运用
  5. 觉得还需要再添加什么模块,或者遇到什么问题可及时微信私聊趣木木


旧版有这个词条,感觉可以在方法论层面再充实一下



Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.[1][2] Data science is related to data mining and big data.

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining and big data.

数据科学是一个跨学科的领域,它使用科学的方法、过程、算法和系统从许多结构和非结构化数据中提取知识和见解。数据科学与数据挖掘和大数据有关。


 --趣木木讨论)下为旧版相对应的引言内容的参考 可进行一下整及或填充


Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.[3] It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.[4][5]

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

数据科学是一个“统一统计学、数据分析、机器学习及其相关方法的概念” ,目的是用数据“理解和分析实际现象”。它使用的技术和理论从许多领域的背景下,数学,统计,计算机科学和信息科学。图灵奖获得者吉姆•格雷(Jim Gray)将数据科学想象为科学的“第四范式”(经验主义、理论主义、计算主义,现在是数据驱动的) ,并断言“由于信息技术和数据泛滥的影响,科学的一切都在改变”。


数据科学类似于数据挖掘,是一个使用科学的方法、过程、算法和系统,从有结构或无结构的各种形式的数据中提炼知识和见解的跨学科领域。 [6] [7] 数据科学的概念结合了统计学、数据分析、机器学习等相关方法以便于借助数据理解和分析实际现象。 [8] 它使用了来自数学统计学信息科学计算机科学等许多学科领域的技巧和理论。

图灵奖得主吉姆·格雷(Jim Gray)将数据科学设想为一种科学的“第四范式”(经验主义理论研究、计算机辅助,现在是数据驱动),并且断言所有关于科学的事物由于信息技术和数据洪流的影响在不断地发生改变。 [4] [5] 在2012年《哈佛商业评论》称其为“21世纪最富有魅力的工作”后 [9] ,“数据科学”成了一个流行术语。它现在经常与早期概念互换使用,例如商业分析 [10]商业智能预测模型统计学。“数据科学富有魅力”的观点甚至被汉斯·罗斯林(Hans Rosling)博士在2011年BBC纪录片中转述为“统计学是当今世界最具吸引力的学科。”内特·西尔弗(Nate Silver) [11] 则将数据科学描述为一种对于统计学家更具吸引力的词语。在许多场合,为了博人眼球,一些早期的解决方案现在被简单地打上了“数据科学”的旗号,而这可能冲淡这个术语的效用。 [12] 虽然现在许多大学的项目都提供数据科学学位,然而它们对数据科学的定义或者合适的课程内容都没有达成一致。 [10] 数据科学学位分量大跌,究其原因是许多数据科学和大数据项目没能给出有用的结果,而这通常是糟糕的管理和资源利用造成的。 [13] [14] [15] [16]



总有机碳




Foundations背景

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large (see big data).[17] The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, and business.[18][19] Statistician Nathan Yau, drawing on Ben Fry, also links data science to human-computer interaction: users should be able to intuitively control and explore data.[20][21] In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.[22]

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large (see big data). The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, and business. Statistician Nathan Yau, drawing on Ben Fry, also links data science to human-computer interaction: users should be able to intuitively control and explore data. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.

数据科学是一个跨学科的领域,专注于从数据集中提取知识,这些数据集通常都很大(见大数据)。这个领域包括分析,为分析准备数据,以及为组织的高层决策提供结果。因此,它融合了来自计算机科学、数学、统计学、信息可视化、平面设计和商业的技能。统计学家 Nathan Yau 借鉴 Ben Fry 的观点,也把数据科学和人机交互联系起来: 用户应该能够直观地控制和探索数据。2015年,美国统计协会确定数据库管理、统计和机器学习,以及分布式和并行系统为三个新兴的基础专业社区。





Etymology 术语词义衍变

词源学



Early usage

Early usage

早期使用



In 1962, John Tukey described a field he called “data analysis,” which resembles modern data science.[23] Later, attendees at a 1992 statistics symposium at the University of Montpellier II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.[24][25]

In 1962, John Tukey described a field he called “data analysis,” which resembles modern data science. Later, attendees at a 1992 statistics symposium at the University of Montpellier II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.

1962年,John Tukey 描述了一个他称之为“数据分析”的领域,类似于现代数据科学。后来,参加1992年第二届蒙彼利埃大学统计研讨会的与会者承认了一个新的学科的出现,这个学科专注于各种起源和形式的数据,将统计和数据分析的既定概念和原则与计算结合起来。



The term “data science” has been traced back to 1974, when Peter Naur proposed it as an alternative name for computer science.[6] In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic.[6] However, the definition was still in flux. In 1997, C.F. Jeff Wu suggested that statistics should be renamed data science. He reasoned that a new name would help statistics shed inaccurate stereotypes, such as being synonymous with accounting, or limited to describing data.[26] In 1998, Chikio Hayashi argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.[27]

The term “data science” has been traced back to 1974, when Peter Naur proposed it as an alternative name for computer science. In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic. In 1998, Chikio Hayashi argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.

术语“数据科学”可以追溯到1974年,彼得 · 诺尔提出它作为计算机科学的替代名称。1996年,国际船级社联合会成为第一个以数据科学为专题的会议。1998年,林志雄主张数据科学是一个新的、跨学科的概念,包括数据设计、数据收集和数据分析三个方面。



During the 1990s, popular terms for the process of finding patterns in datasets (which were increasingly large) included “knowledge discovery” and “data mining.”[28][6]

During the 1990s, popular terms for the process of finding patterns in datasets (which were increasingly large) included “knowledge discovery” and “data mining.”

在20世纪90年代,在数据集中寻找模式的流行术语(数据集越来越大)包括“知识发现”和“数据挖掘”

 --趣木木讨论)下为旧版关于数据科学的词源演变由来的部分内容  可参考整合并进行填充

“数据科学”这一术语在过去的三十年里已经出现在各种语境中,但直到最近才成为一个确定的术语。在早期,1960年它被彼得·诺尔(Peter Naur)用作计算机科学的代名词。诺尔后来引入了“数据学”(datalogy)这一术语。 [29] 在1974年,诺尔出版了《计算机方法简明调查》,在这本书对同时代被广泛应用的数据处理方法的调查中,他自如地使用了“数据科学”这一术语。

在1996年,国际分级社团联盟 (IFCS)的成员在日本神户举行了两年一次的会议,在此,术语“数据科学”在由林知己夫(Chikio Hayashi) [8] 举办的圆桌讨论上得到介绍之后首次被纳入会议标题(“数据科学、分级、相关方法”)。 [30]

在1997年11月,吴建福(C.F. Jeff Wu)为他被密歇根大学给予的H.C Carver教授职位任命发表了题为“统计学=数据科学?” [31] 的就职演讲 [32] ,在演讲中他将统计学工作描述为数据收集、建模和分析、决策的三部曲。在结论中他首创了现代的、非计算机科学的“数据科学”术语用法,并提倡统计学应被更名为数据科学,统计学家应被称作数据科学家。 [31] 之后,他又在1998年纪念印度科学家和统计学家、印度统计学院创立者马哈拉诺比斯(P.C. Mahalanobis)的讲座上将同名演讲作为其系列演讲 [33] 的第一篇发表。

在2001年,威廉·克利夫兰(William S.Cleveland)在他的文章《数据科学:一个用来扩大统计学领域技术范畴的行动计划》将数据科学作为一门独立学科引入,扩大了统计学的领域并使之包含“数据计算的前沿”,这篇文章发表在2001年4月版的《国际统计评论》(International Statistical Review / Revue Internationale de Statistique)的第69卷,第1篇。 [34] 在他的报告中,克利夫兰建立了他认为数据科学所围绕的6个技术领域:多学科调查,数据模型和方法,数据计算,教学法、工具评估和理论。

在2002年4月,国际科学委员会(ICSU):数据科学与技术分会(CODATA) [35] 创办了数据科学期刊(Data Science Journal[36] ,这是一份聚焦于诸如数据系统描述、网络出版物、应用和法律问题的出版物 [37] 。之后不久,哥伦比亚大学在2003年1月开始出版数据科学期刊(The Journal of Data Science[38] ,为所有数据工作者提供了发表意见和交流想法的平台。这份期刊衷心致力于统计学方法应用和定量研究。在2005年,国家科学委员会出版了“长期数字数据收集:赋能21世纪的研究和教育”,定义数据科学家为“信息和计算机科学家、数据库和软件程序员、学科专家、管理者和注释专家、图书管理员、档案保管员,以及其它对数字化数据收集的成功管理起到关键性作用的人。”他们的首要活动是“进行创造性探究与分析。” [39]

在2007年左右, [40] 图灵奖得主吉姆·格雷(Jim Gray)预见到使用大数据的分析计算作为主要科学方法的“数据驱动的科学”将成为科学的第四范式 [4] [5] ,我们将迎来一个科学文献、科学数据全部在线且彼此利用的世界。 [41]

在2012年《哈佛商业评论》的报道“数据科学家:21世纪最富有魅力的工作”中 [9]帕蒂尔(DJ Patil)声称其已于2008年和杰弗·哈梅巴赫(Jeff Hammerbacher)共同创造了这一术语,用以标注他们在领英和脸书上的职业信息。他断言数据科学家将是一种全新的职业类型,并且数据科学家的短缺正成为某些领域的严重掣肘,但同时也将其描述为一个更加商业化导向的角色。

2013年,IEEE数据科学和高等分析专门工作组 [42] 成立,同年第一届“欧洲数据分析大会(ECDA)”在卢森堡召开,会上成立了欧洲数据科学协会(EuADS)。第一届国际会议——IEEE国际数据科学和高等分析会议于2014年召开。 [43] 同年,编程训练营始祖General Assembly启动了学生付费培训,数据孵化器公司成立了一个富有竞争力的自由数据科学团体。 [44] 也是在2014年,美国统计协会的统计学习和数据挖掘部门将其期刊更名为“统计分析与数据挖掘:ASA数据科学期刊”,并在2016年将其部门更名为“统计学习与数据科学”。 [45] 2015年,Springer创办国际数据科学与分析杂志 [46] ,用来出版有关数据科学和大数据分析方面的原创性工作。2015年9月,GfKI在英国克彻斯特的埃塞克斯大学举办的第三届ECDA大会上增设“数据科学社团”。



Modern usage

现代用法

The modern conception of data science as an independent discipline is sometimes attributed to William S. Cleveland.[47] In a 2001 paper, he advocated an expansion of statistics beyond theory into technical areas; because this would significantly change the field, it warranted a new name.[28] "Data science" became more widely used in the next few years: in 2002, the Committee on Data for Science and Technology launched Data Science Journal. In 2003, Columbia University launched The Journal of Data Science.[28] In 2014, the American Statistical Association's Section on Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and Data Science, reflecting the ascendant popularity of data science.[48]

The modern conception of data science as an independent discipline is sometimes attributed to William S. Cleveland. In a 2001 paper, he advocated an expansion of statistics beyond theory into technical areas; because this would significantly change the field, it warranted a new name.

数据科学作为一门独立学科的现代概念,有时归功于威廉 · s · 克利夫兰。在2001年的一篇论文中,他主张将统计学从理论扩展到技术领域; 因为这将大大改变这个领域,它需要一个新的名称。



The professional title of “data scientist” has been attributed to DJ Patil and Jeff Hammerbacher in 2008.[49] Though it was used by the National Science Board in their 2005 report, "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century," it referred broadly to any key role in managing a digital data collection.[50]

The professional title of “data scientist” has been attributed to DJ Patil and Jeff Hammerbacher in 2008. Though it was used by the National Science Board in their 2005 report, "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century," it referred broadly to any key role in managing a digital data collection.

2008年,DJ 帕蒂尔和杰夫哈默巴赫尔被授予“数据科学家”的职称。尽管美国国家科学委员会(National Science Board)在其2005年的报告《长期数字数据收集: 21世纪的研究和教育成果》(Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century)中使用了这个词,但它广泛地提到了管理数字数。



There is still no consensus on the definition of data science and it is considered by some to be a buzzword.[51]

There is still no consensus on the definition of data science and it is considered by some to be a buzzword.

对于数据科学的定义还没有达成共识,有些人认为这是一个流行词。


研究内容

 --趣木木讨论)并不限于所列出来的条目 可以根据研究内容进行自主填充

数据科学基础理论

数据预处理

数据计算

数据管理

Careers in data science 数据科学的相关职业

Data science is a growing field. A career as a data scientist is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019.[52] Data scientists have a median salary of $118,370 per year or $56.91 per hour.[53] Job growth in this field is also above average, with a projected increase of 16% from 2018 to 2028.[53] The largest employer of data scientists in the US is the federal government, employing 28% of the data science workforce.[53] Other large employers of data scientists are computer system design services, research and development laboratories, and colleges and universities.[53] Typically, data scientists work full time, and some work more than 40 hours a week.[53]

Data science is a growing field. A career as a data scientist is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019. Data scientists have a median salary of $118,370 per year or $56.91 per hour. Job growth in this field is also above average, with a projected increase of 16% from 2018 to 2028. The largest employer of data scientists in the US is the federal government, employing 28% of the data science workforce. Other large employers of data scientists are computer system design services, research and development laboratories, and colleges and universities. Typically, data scientists work full time, and some work more than 40 hours a week.

数据科学是一个不断发展的领域。数据科学家的职业被 Glassdoor 评为2020年美国最佳工作的第三名,并被评为2016-2019年最佳工作的第一名。数据科学家的平均工资是每年118,370美元或每小时56.91美元。该领域的就业增长也高于平均水平,预计从2018年到2028年将增长16% 。美国数据科学家的最大雇主是联邦政府,雇佣了28% 的数据科学工作人员。数据科学家的其他大型雇主有计算机系统设计服务、研究和开发实验室以及学院和大学。通常情况下,数据科学家全职工作,有些人每周工作超过40小时。




Educational path

教育途径

In order to become a data scientist, there is a significant amount of education and experience required. The first step in becoming a data scientist is to earn a bachelor's degree, typically in a field related to computing or mathematics.[54][53] Coding bootcamps are also available and can be used as an alternate pre-qualification to supplement a bachelor's degree in another field.[54] Most data scientists also complete a master’s degree or a PhD in data science.[54] Once these qualifications are met, the next step to becoming a data scientist is to apply for an entry level job in the field.[54] Some data scientists may later choose to specialize in a sub-field of data science.[54]

In order to become a data scientist, there is a significant amount of education and experience required. The first step in becoming a data scientist is to earn a bachelor's degree, typically in a field related to computing or mathematics. Coding bootcamps are also available and can be used as an alternate pre-qualification to supplement a bachelor's degree in another field. Most data scientists also complete a master’s degree or a PhD in data science. Once these qualifications are met, the next step to becoming a data scientist is to apply for an entry level job in the field. Some data scientists may later choose to specialize in a sub-field of data science.

要成为一名数据科学家,需要大量的教育和经验。成为数据科学家的第一步是获得学士学位,通常是在与计算或数学相关的领域。编程训练营也是可用的,可以作为其他领域的学士学位的补充资格预审。大多数数据科学家还完成了数据科学的硕士学位或博士学位。一旦符合这些条件,成为数据科学家的下一步就是申请该领域的入门级工作。一些数据科学家以后可能会选择专攻数据科学的一个分支领域。




Specializations and associated careers

专业化和相关职业



  • Machine Learning Scientist: Machine learning scientists research new methods of data analysis and create algorithms.[55]


  • Data Analyst: Data analysts utilize large data sets to gather information that meets their company’s needs.[55]


  • Data Consultant: Data consultants work with businesses to determine the best usage of the information yielded from data analysis.[54]


  • Data Architect: Data architects build data solutions that are optimized for performance and design applications.[55]


  • Applications Architect: Applications architects track how applications are used throughout a business and how they interact with users and other applications.[55]





Impacts of data science数据科学的影响

 --趣木木讨论)需要再进行补充 内容过少

Big data is very quickly becoming a vital tool for businesses and companies of all sizes.[56] The availability and interpretation of big data has altered the business models of old industries and enabled the creation of new ones.[56] Data-driven businesses are worth $1.2 trillion collectively in 2020, an increase from $333 billion in the year 2015.[57] Data scientists are responsible for breaking down big data into usable information and creating software and algorithms that help companies and organizations determine optimal operations.[57] As big data continues to have a major impact on the world, data science does as well due to the close relationship between the two.[57]

Big data is very quickly becoming a vital tool for businesses and companies of all sizes. The availability and interpretation of big data has altered the business models of old industries and enabled the creation of new ones. Data scientists are responsible for breaking down big data into usable information and creating software and algorithms that help companies and organizations determine optimal operations. As big data continues to have a major impact on the world, data science does as well due to the close relationship between the two.

大数据正迅速成为各种规模的企业和公司的重要工具。大数据的可用性和解释改变了旧行业的商业模式,并促成了新行业的创建。数据科学家负责将大数据分解成可用的信息,并创建软件和算法,帮助企业和组织确定最佳操作。随着大数据继续对世界产生重大影响,数据科学也由于两者之间的密切关系而产生重大影响。




Technologies and techniques 所涉及的技术和应用软件

There are a variety of different technologies and techniques that are used for data science which depending on the application.

There are a variety of different technologies and techniques that are used for data science which depending on the application.

有各种不同的技术和技术用于数据科学,这取决于应用。




Techniques

  • Clustering is a technique used to group data together.



  • Machine learning is a technique used to perform tasks by inferencing patterns from data.





Technologies

技术



  • Python is a programming language with simple syntax that is commonly used for data science.[58] There are a number of python libraries that are used in data science including numpy, pandas, and scipy.


  • R is a programming language that was designed for statisticians and data mining[59] and is optimized for computation.


  • TensorFlow is a framework for creating machine learning models developed by Google.


  • Pytorch is another framework for machine learning developed by Facebook.


  • Jupyter Notebook is an interactive web interface for Python that allows faster experimentation.


  • Tableau makes a variety of software that is used for data visualization[60].


  • Apache Hadoop is a software framework that is used to process data over large distributed systems.


与机器学习、人工智能之间的异同

Relationship to statistics与统计学的关系

Many statisticians, including Nate Silver, have argued that data science is not a new field, but rather another name for statistics.[61] Others argue that data science is distinct from statistics because it focuses on problems and techniques unique to digital data.[62] Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g. images) and emphasizes prediction and action.[63] Andrew Gelman of Columbia University and data scientist Vincent Granville have described statistics as a nonessential part of data science.[64][65]

Many statisticians, including Nate Silver, have argued that data science is not a new field, but rather another name for statistics. Others argue that data science is distinct from statistics because it focuses on problems and techniques unique to digital data. Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g. images) and emphasizes prediction and action. Andrew Gelman of Columbia University and data scientist Vincent Granville have described statistics as a nonessential part of data science.

包括纳特 · 西尔弗在内的许多统计学家都认为,数据科学不是一个新领域,而是统计学的另一个名称。其他人则认为数据科学不同于统计学,因为它专注于数字数据所特有的问题和技术。瓦桑特 · 达尔写道,统计学强调定量数据和描述。相比之下,数据科学研究的是定量和定性的数据。图片) ,并强调预测和行动。哥伦比亚大学的安德鲁 · 格尔曼和数据科学家文森特 · 格兰维尔将统计学描述为数据科学中不重要的部分。



Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing, and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program. He describes data science as an applied field growing out of traditional statistics.[23]

Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing, and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program. He describes data science as an applied field growing out of traditional statistics.

斯坦福大学教授 David Donoho 写道,数据科学与统计学之间并不存在数据集的大小或计算机的使用,许多研究生课程错误地宣传他们的分析学和统计学训练是数据科学课程的本质。他把数据科学描述为从传统统计学中发展出来的一个应用领域。


  --趣木木讨论)下为旧版词条中对应部分内容 可进行整合参考并填充

飞速增长的职位空缺表明“数据科学”的概念在商业界和学术界可谓一夜蹿红。 [66] 然而许多持批判态度的学者和新闻记者并没有看出数据科学与统计学的区别。吉尔·普莱斯(Gil Press)在福布斯杂志上撰文主张数据科学只是一个缺乏清晰定义的流行术语,并且在诸如研究生的课程内容中成了“商业分析”的简单替代。 [10]美国统计协会的联合统计学会议上发表主旨演说后的问答部分,著名应用统计学家内特·西尔弗(Nate Silver)说道:“我认为数据科学家对于统计学家是一个富有魅力的词语…统计学是科学的一条分支。数据科学家在某种意义上略显多余,而且人们不应该痛斥统计学家这个词。” [11] 同样,在商业领域,各方研究者和分析师表示,仅仅有数据科学家远远不足以赋予公司真正的竞争优势, [67] 而且,仅仅把数据科学家看作四项更伟大的工作种类之一,各公司需要为大数据进行有效的融资,亦即:数据分析师、数据科学家、大数据开发者和大数据工程师[68]

另一方面,也有无数对批评的回应。在2014年一篇《华尔街日报》的文章中,欧文·沃拉达斯凯-伯杰(Irving Wladawsky-Berger)比较了数据科学的狂热与计算机科学的黎明。他坚称,就像其他交叉学科领域一样,数据科学利用来自学术界工业界方法论和实践,但之后会将它们变成一个新学科。他特别强调了现在一个广受认可的学术科目计算机科学曾面临的尖锐批评。 [28] 类似地,就像许多其他数据科学学界支持者一样, [28] 纽约大学斯特恩商学院的瓦桑德·达尔(Vasant Dhar)在2013年12月更加明确地表示数据科学与现存的仅仅聚焦于解释数据集的横跨所有学科的数据分析实践不同。数据科学为预测模型寻求了可行和一致的模式[6] 这项实际的工程目标采用了超越了传统数据分析的数据科学。如今这些学科和应用领域的数据缺乏可靠理论以供形成有力的预测模型,就像健康科学社会科学那样。 [6]

斯坦福大学教授大卫·多诺霍(David Donoho)于2015年9月在一次与达尔类似的尝试中,通过抵制批评界对数据科学的三种过分简单化和误导性的定义,提出了更长远的主张。 [53] 第一,对多诺霍而言,数据科学不等同于大数据,因为数据集的规模不是区分数据科学和统计学的标准。 [53] 第二,数据科学不是由将大数据集分类整理的计算技能定义的,因为这些技能已经被广泛地用作所有学科的分析。 [53] 第三,数据科学现在是一个学术项目尚不足以给数据科学家日后的工作提供充足准备,而已然得到大量应用的领域,因为许多研究生项目带有误导性地宣传他们的分析和统计学训练是一个数据科学项目的实质。 [53] [69] 作为一名统计学家多诺霍继承了学界诸多前辈的衣钵,拥护着数据科学研究范围的扩充, [53] 就像约翰·钱伯斯(John Chambers)极力主张统计学家采用一种包容的从数据中学习的概念、 [70] 威廉·克利夫兰(William Cleveland)强调把从数据中提取具有应用价值的预测工具摆在比发掘解释性理论更高的优先级上一样。 [34] 这些统计学家们共同展望着一个日益包容、从传统的统计学中生长出来并青出于蓝而胜于蓝的应用领域。

为了数据科学的未来,多诺霍为开放性科学规划了一个不断成长的环境,使所有研究者都可以访问用于学术出版物的数据集。 [53] 美国国家卫生研究院已经宣布了提高研究数据再现性和透明度的计划。 [71] 其它的大型期刊亦紧随其后。 [72] [73] 这样,数据科学的未来不仅在规模和方法论上超越了统计学理论的界线,它还会彻底革新现在的学术和研究范式[53] 诚如多诺霍所言蔽之:“数据科学的范围和影响在今后数十年会继续扩充,科研数据和有关科学本身的数据将无处不在、俯拾即是。” [53]



References

参考资料

  1. Dhar, V. (2013). "Data science and prediction". Communications of the ACM. 56 (12): 64–73. doi:10.1145/2500499. Archived from the original on 9 November 2014. Retrieved 2 September 2015.
  2. Jeff Leek (2013-12-12). "The key word in "Data Science" is not Data, it is Science". Simply Statistics. Archived from the original on 2 January 2014. Retrieved 1 January 2014.
  3. Hayashi, Chikio (1998-01-01). "What is Data Science? Fundamental Concepts and a Heuristic Example". In Hayashi, Chikio (in en). Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Japan. pp. 40–51. doi:10.1007/978-4-431-65950-1_3. ISBN 9784431702085. https://www.springer.com/book/9784431702085. 
  4. 4.0 4.1 4.2 Stewart Tansley; Kristin Michele Tolle (2009). The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. ISBN 978-0-9825442-0-4. https://books.google.com/?id=oGs_AQAAIAAJ.  引用错误:无效<ref>标签;name属性“TansleyTolle2009”使用不同内容定义了多次
  5. 5.0 5.1 5.2 Bell, G.; Hey, T.; Szalay, A. (2009). "COMPUTER SCIENCE: Beyond the Data Deluge". Science. 323 (5919): 1297–1298. doi:10.1126/science.1170411. ISSN 0036-8075. PMID 19265007. 引用错误:无效<ref>标签;name属性“BellHey2009”使用不同内容定义了多次
  6. 6.0 6.1 6.2 6.3 6.4 6.5 Dhar, V. (2013). "Data science and prediction". Communications of the ACM. 56 (12): 64. {{cite journal}}: Text "doi : 10.1145/2500499" ignored (help) 引用错误:无效<ref>标签;name属性“:0”使用不同内容定义了多次
  7. Jeff Leek 2013-12-12. The key word in "Data Science" is not Data, it is Science. Simply Statistics.
  8. 8.0 8.1 Hayashi, Chikio (1998-01-01). "What is Data Science? Fundamental Concepts and a Heuristic Example". In Hayashi, Chikio (in en). Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Japan. pp. 40–51. https://link.springer.com/chapter/10.1007/978-4-431-65950-1_3. 
  9. 9.0 9.1 Davenport, Thomas H.; Patil, DJ (Oct 2012). "Data Scientist: The Sexiest Job of the 21st Century". Harvard Business Review. {{cite journal}}: Cite journal requires |journal= (help)
  10. 10.0 10.1 10.2 Data Science: What's The Half-Life Of A Buzzword?. Forbes.2013-08-19.
  11. 11.0 11.1 "Nate Silver: What I need from statisticians". 23 Aug 2013
  12. Warden, Pete(2011-05-09). "Why the term "data science" is flawed but useful" O'Reilly Radar. Retrieved 2018-05-20.
  13. "Are You Setting Your Data Scientists Up to Fail?". Harvard Business Review.2018-01-25. Retrieved 2018-05-26.
  14. "70% of Big Data projects in UK fail to realise full potential" www.consultancy.uk. Retrieved 2018-05-26.
  15. "The Data Economy: Why do so many analytics projects fail? - Analytics Magazine". Analytics Magazine. 2014-07-07. Retrieved 2018-05-26.
  16. "Data Science: 4 Reasons Why Most Are Failing to Deliver". www.kdnuggets.com. Retrieved 2018-05-26.
  17. "About Data Science | Data Science Association". www.datascienceassn.org. Retrieved 2020-04-03.
  18. "1. Introduction: What Is Data Science? - Doing Data Science [Book]". www.oreilly.com (in English). Retrieved 2020-04-03.
  19. "the three sexy skills of data geeks". m.e.driscoll: data utopian (in English). Retrieved 2020-04-03.
  20. Yau, Nathan (2009-06-04). "Rise of the Data Scientist". FlowingData (in English). Retrieved 2020-04-03.
  21. "Basic Example". benfry.com. Retrieved 2020-04-03.{{cite web}}: CS1 maint: url-status (link)
  22. "ASA Statement on the Role of Statistics in Data Science". AMSTATNEWS. American Statistical Association. 2015-10-01. Archived from the original on 20 June 2019. Retrieved 2019-05-29.
  23. 23.0 23.1 Donoho, David (September 18, 2015). "50 years of Data Science" (PDF). Retrieved April 2, 2020.{{cite web}}: CS1 maint: url-status (link)
  24. Data science and its applications = La @science des données et ses applications. Escoufier, Yves., Hayashi, Chikio (1918-....)., Fichet, Bernard.. Tokyo: Academic Press/Harcourt Brace. 1995. ISBN 0-12-241770-4. OCLC 489990740. 
  25. Murtagh, Fionn; Devlin, Keith (2018). "The Development of Data Science: Implications for Education, Employment, Research, and the Data Revolution for Sustainable Development". Big Data and Cognitive Computing (in English). 2 (2): 14. doi:10.3390/bdcc2020014.
  26. Wu, C.F. Jeff. "Statistics=Data Science?" (PDF). Retrieved April 2, 2020.{{cite web}}: CS1 maint: url-status (link)
  27. Murtagh, Fionn; Devlin, Keith (2018). "The Development of Data Science: Implications for Education, Employment, Research, and the Data Revolution for Sustainable Development". Big Data and Cognitive Computing (in English). 2 (2): 14. doi:10.3390/bdcc2020014.
  28. 28.0 28.1 28.2 28.3 28.4 Press, Gil. "A Very Short History Of Data Science". Forbes (in English). Retrieved 2020-04-03. 引用错误:无效<ref>标签;name属性“:1”使用不同内容定义了多次
  29. Naur, Peter (1 July 1966). "The science of datalogy". Communications of the ACM. 9 (7): 485. {{cite journal}}: Text "doi:10.1145/365719.366510" ignored (help)
  30. Press, Gil. "A Very Short History Of Data Science".
  31. 31.0 31.1 Wu, C. F. J. (1997). "Statistics = Data Science?". Retrieved 9 October 2014.
  32. "Identity of statistics in science examined" .The University Records, 9 November 1997, The University of Michigan. Retrieved 12 August 2013.
  33. "P.C. Mahalanobis Memorial Lectures, 7th series". P.C. Mahalanobis Memorial Lectures, Indian Statistical Institute. Archived from the original on 26 Feb 2017. Retrieved 18 Jul 2017.
  34. 34.0 34.1 Cleveland, W. S. (2001). Data science: an action plan for expanding the technical areas of the field of statistics. International Statistical Review / Revue Internationale de Statistique, 21–26.
  35. International Council for Science : Committee on Data for Science and Technology. (2012, April). CODATA, The Committee on Data for Science and Technology. Retrieved from International Council for Science : Committee on Data for Science and Technology: http://www.codata.org/
  36. Data Science Journal. (2012, April). Available Volumes. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/_vols
  37. Data Science Journal. (2002, April). Contents of Volume 1, Issue 1, April 2002. Retrieved from Japan Science and Technology Information Aggregator, Electronic: http://www.jstage.jst.go.jp/browse/dsj/1/0/_contents
  38. The Journal of Data Science. (2003, January). Contents of Volume 1, Issue 1, January 2003. Retrieved from http://www.jds-online.com/v1-1
  39. National Science Board. Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century . National Science Foundation . Retrieved 30 June 2013.
  40. Citation needed
  41. Markoff,John(2009-12-14). "Essays Inspired by Microsoft’s Jim Gray, Who Saw Science Paradigm Shift". The New York Times. Retrieved 2018-04-26.
  42. "IEEE Task Force on Data Science and Advanced Analytics"
  43. "2014 IEEE International Conference on Data Science and Advanced Analytics"
  44. "NY gets new bootcamp for data scientists: It’s free, but harder to get into than Harvard ". Venture Beat Retrieved 2016-02-22.
  45. Talley,Jill(2016-06-01) "ASA Expands Scope, Outreach to Foster Growth, Collaboration in Data Science" . AMSTATNEWS. American Statistical Association. Retrieved 2017-02-04
  46. "Journal on Data Science and Analytics"
  47. Gupta, Shanti (December 11, 2015). "William S Cleveland". Retrieved April 2, 2020.{{cite web}}: CS1 maint: url-status (link)
  48. Talley, Jill (June 1, 2016). "ASA Expands Scope, Outreach to Foster Growth, Collaboration in Data Science". Amstat News. American Statistical Association.{{cite news}}: CS1 maint: url-status (link)
  49. Davenport, Thomas H.; Patil, D. J. (2012-10-01). "Data Scientist: The Sexiest Job of the 21st Century". Harvard Business Review. No. October 2012. ISSN 0017-8012. Retrieved 2020-04-03.
  50. "US NSF - NSB-05-40, Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century". www.nsf.gov. Retrieved 2020-04-03.
  51. Press, Gil. "Data Science: What's The Half-Life Of A Buzzword?". Forbes (in English). Retrieved 2020-04-03.
  52. "Best Jobs in America". Glassdoor (in English). Retrieved 2020-04-03.
  53. 53.00 53.01 53.02 53.03 53.04 53.05 53.06 53.07 53.08 53.09 53.10 53.11 53.12 53.13 "Computer and Information Research Scientists : Occupational Outlook Handbook: : U.S. Bureau of Labor Statistics". www.bls.gov (in English). Retrieved 2020-04-03. 引用错误:无效<ref>标签;name属性“:2”使用不同内容定义了多次
  54. 54.0 54.1 54.2 54.3 54.4 54.5 "What is a Data Scientist?". Master's in Data Science (in English). Retrieved 2020-04-03.
  55. 55.0 55.1 55.2 55.3 "11 Data Science Careers Shaping the Future". Northeastern University Graduate Programs (in English). 2018-11-23. Retrieved 2020-04-03.
  56. 56.0 56.1 Pham, Peter. "The Impacts Of Big Data That You May Not Have Heard Of". Forbes (in English). Retrieved 2020-04-03.
  57. 57.0 57.1 57.2 Martin, Sophia (2019-09-20). "How Data Science will Impact Future of Businesses?". Medium (in English). Retrieved 2020-04-03.
  58. Shell, M Scott (September 24, 2019). "An introduction to Python for scientific computing" (PDF). Retrieved April 2, 2020.{{cite web}}: CS1 maint: url-status (link)
  59. "R FAQ". cran.r-project.org. Retrieved 2020-04-03.
  60. Rhodes, Margaret (15 July 2014). "A Dead-Simple Tool That Lets Anyone Create Interactive Maps". Wired. Retrieved 2020-04-03.
  61. "Nate Silver: What I need from statisticians - Statistics Views". www.statisticsviews.com. Retrieved 2020-04-03.
  62. "What's the Difference Between Data Science and Statistics?". Priceonomics (in English). Retrieved 2020-04-03.
  63. DharVasant (2013-12-01). "Data science and prediction". Communications of the ACM (in English). 56 (12): 64–73. doi:10.1145/2500499.
  64. "Statistics is the least important part of data science « Statistical Modeling, Causal Inference, and Social Science". statmodeling.stat.columbia.edu. Retrieved 2020-04-03.
  65. Posted by Vincent Granville on December 8, 2014 at 5:00pm; Blog, View. "Data science without statistics is possible, even desirable". www.datasciencecentral.com (in English). Retrieved 2020-04-03.
  66. Darrow,Barb(May 21, 2015). "Data science is still white hot, but nothing lasts forever" .Fortune. Retrieved November 20, 2017.
  67. Miller, Steven (2014-04-10). "Collaborative Approaches Needed to Close the Big Data Skills Gap". Journal of Organization Design (in English). 3 (1): 26–30. {{cite journal}}: Text "doi:10.7146/jod.9823" ignored (help); Text "issn:2245-408X" ignored (help)
  68. De Mauro, Andrea; Greco, Marco; Grimaldi, Michele; Ritala, Paavo. "Human resources for Big Data professions: A systematic classification of job roles and required skill sets". Information Processing & Management. {{cite journal}}: Text "doi:10.1016/j.ipm.2017.05.004" ignored (help)
  69. Barlow, Mike (2013). The Culture of Big Data. O'Reilly Media, Inc.. 
  70. Chambers, John M. (1993-12-01). "Greater or lesser statistics: a choice for future research". Statistics and Computing (in English). 3 (4): 182–184. {{cite journal}}: Text "doi:10.1007/BF00141776" ignored (help); Text "issn:0960-3174" ignored (help)
  71. Collins, Francis S.; Tabak, Lawrence A. (2014-01-30). "NIH plans to enhance reproducibility". Nature. 505 (7485): 612–613. {{cite journal}}: Text "doi:10.1038/505612a" ignored (help); Text "issn:0028-0836" ignored (help); Text "pmc:4058759" ignored (help); Text "pmid:24482835" ignored (help)
  72. McNutt, Marcia (2014-01-17). "Reproducibility". Science (in English). 343 (6168): 229–229. {{cite journal}}: Text "doi:10.1126/science.1250475" ignored (help); Text "issn:0036-8075" ignored (help); Text "pmid:24436391" ignored (help)
  73. Peng, Roger D. (2009-07-01). "Reproducible research and Biostatistics". Biostatistics (in English). 10 (3): 405–408. {{cite journal}}: Text "doi:10.1093/biostatistics/kxp014" ignored (help); Text "issn:1465-4644" ignored (help)

Category:Information science

类别: 信息科学

Category:Computer occupations

类别: 计算机职业

Category:Computational fields of study

类别: 研究的计算领域

Category:Data analysis

类别: 数据分析


This page was moved from wikipedia:en:Data science. Its edit history can be viewed at 数据科学/edithistory