更改

跳到导航 跳到搜索
添加34,343字节 、 2020年5月6日 (三) 21:20
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。{{Distinguish|information science}}



{{Use dmy dates|date=December 2012}}



{{Machine learning bar}}







'''Data science''' is an [[inter-disciplinary]] field that uses scientific methods, processes, algorithms and systems to extract [[knowledge]] and insights from many structural and [[unstructured data]].<ref>{{Cite journal | last1 = Dhar | first1 = V. | title = Data science and prediction | doi = 10.1145/2500499 | journal = Communications of the ACM | volume = 56 | issue = 12 | pages = 64–73 | year = 2013 | pmid = | pmc = | url = http://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext | access-date = 2 September 2015 | archive-url = https://web.archive.org/web/20141109113411/http://cacm.acm.org/magazines/2013/12/169933-data-science-and-prediction/fulltext | archive-date = 9 November 2014 | url-status = live }}</ref><ref>{{cite web | url=http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/ | title=The key word in "Data Science" is not Data, it is Science | publisher=Simply Statistics | date=2013-12-12 | author=[[Jeffrey T. Leek|Jeff Leek]] | access-date=1 January 2014 | archive-url=https://web.archive.org/web/20140102194117/http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/ | archive-date=2 January 2014 | url-status=live }}</ref> Data science is related to [[data mining]] and [[big data]].

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining and big data.

数据科学是一个跨学科的领域,它使用科学的方法、过程、算法和系统从许多结构和非结构化数据中提取知识和见解。数据科学与数据挖掘和大数据有关。





Data science is a "concept to unify [[statistics]], [[data analysis]], [[machine learning]] and their related methods" in order to "understand and analyze actual phenomena" with data.<ref>{{Cite book|chapter-url=https://www.springer.com/book/9784431702085|title=Data Science, Classification, and Related Methods|last=Hayashi|first=Chikio|date=1998-01-01|publisher=Springer Japan|isbn=9784431702085|editor-last=Hayashi|editor-first=Chikio|series=Studies in Classification, Data Analysis, and Knowledge Organization|location=|pages=40–51|language=en|chapter=What is Data Science? Fundamental Concepts and a Heuristic Example|doi=10.1007/978-4-431-65950-1_3|editor-last2=Yajima|editor-first2=Keiji|editor-last3=Bock|editor-first3=Hans-Hermann|editor-last4=Ohsumi|editor-first4=Noboru|editor-last5=Tanaka|editor-first5=Yutaka|editor-last6=Baba|editor-first6=Yasumasa}}</ref> It uses techniques and theories drawn from many fields within the context of [[mathematics]], [[statistics]], [[computer science]], and [[information science]]. [[Turing award]] winner [[Jim Gray (computer scientist)|Jim Gray]] imagined data science as a "fourth paradigm" of science ([[Empirical research|empirical]], [[Basic research|theoretical]], [[computational science|computational]] and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the [[information explosion|data deluge]].<ref name="TansleyTolle2009">{{cite book|author1=Stewart Tansley|author2=Kristin Michele Tolle|title=The Fourth Paradigm: Data-intensive Scientific Discovery|url=https://books.google.com/?id=oGs_AQAAIAAJ|year=2009|publisher=Microsoft Research|isbn=978-0-9825442-0-4|access-date=16 December 2016|archive-url=https://web.archive.org/web/20170320193019/https://books.google.com/books?id=oGs_AQAAIAAJ|archive-date=20 March 2017|url-status=live}}</ref><ref name="BellHey2009">{{cite journal|last1=Bell|first1=G.|last2=Hey|first2=T.|last3=Szalay|first3=A.|title=COMPUTER SCIENCE: Beyond the Data Deluge|journal=Science|volume=323|issue=5919|year=2009|pages=1297–1298|issn=0036-8075|doi=10.1126/science.1170411|pmid=19265007}}</ref>

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

数据科学是一个“统一统计学、数据分析、机器学习及其相关方法的概念” ,目的是用数据“理解和分析实际现象”。它使用的技术和理论从许多领域的背景下,数学,统计,计算机科学和信息科学。图灵奖获得者吉姆•格雷(Jim Gray)将数据科学想象为科学的“第四范式”(经验主义、理论主义、计算主义,现在是数据驱动的) ,并断言“由于信息技术和数据泛滥的影响,科学的一切都在改变”。





__TOC__

__TOC__

总有机碳





== Foundations ==

== Foundations ==

地基

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large (see [[big data]]).<ref>{{Cite web|url=http://www.datascienceassn.org/about-data-science|title=About Data Science {{!}} Data Science Association|website=www.datascienceassn.org|access-date=2020-04-03}}</ref> The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, [[information visualization]], graphic design, and business.<ref>{{Cite web|url=https://www.oreilly.com/library/view/doing-data-science/9781449363871/ch01.html|title=1. Introduction: What Is Data Science? - Doing Data Science [Book]|website=www.oreilly.com|language=en|access-date=2020-04-03}}</ref><ref>{{Cite web|url=https://medriscoll.com/post/4740157098/the-three-sexy-skills-of-data-geeks|title=the three sexy skills of data geeks|website=m.e.driscoll: data utopian|language=en|access-date=2020-04-03}}</ref> Statistician [[Nathan Yau]], drawing on [[Ben Fry]], also links data science to [[Human–computer interaction|human-computer interaction]]: users should be able to intuitively control and explore data.<ref>{{Cite web|url=https://flowingdata.com/2009/06/04/rise-of-the-data-scientist/|title=Rise of the Data Scientist|last=Yau|first=Nathan|date=2009-06-04|website=FlowingData|language=en|access-date=2020-04-03}}</ref><ref>{{Cite web|url=https://benfry.com/phd/dissertation/2.html|title=Basic Example|last=|first=|date=|website=benfry.com|url-status=live|archive-url=|archive-date=|access-date=2020-04-03}}</ref> In 2015, the [[American Statistical Association]] identified [[Database|database management]], statistics and [[machine learning]], and [[Distributed computing|distributed and parallel systems]] as the three emerging foundational professional communities.<ref>{{Cite web|url=https://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/|title=ASA Statement on the Role of Statistics in Data Science|date=2015-10-01|website=AMSTATNEWS|publisher=[[American Statistical Association]]|access-date=2019-05-29|archive-url=https://web.archive.org/web/20190620184935/https://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/|archive-date=20 June 2019|url-status=live}}</ref>

Data science is an interdisciplinary field focused on extracting knowledge from data sets, which are typically large (see big data). The field encompasses analysis, preparing data for analysis, and presenting findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, information visualization, graphic design, and business. Statistician Nathan Yau, drawing on Ben Fry, also links data science to human-computer interaction: users should be able to intuitively control and explore data. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.

数据科学是一个跨学科的领域,专注于从数据集中提取知识,这些数据集通常都很大(见大数据)。这个领域包括分析,为分析准备数据,以及为组织的高层决策提供结果。因此,它融合了来自计算机科学、数学、统计学、信息可视化、平面设计和商业的技能。统计学家 Nathan Yau 借鉴 Ben Fry 的观点,也把数据科学和人机交互联系起来: 用户应该能够直观地控制和探索数据。2015年,美国统计协会确定数据库管理、统计和机器学习,以及分布式和并行系统为三个新兴的基础专业社区。





===Relationship to statistics===

===Relationship to statistics===

与统计学的关系

Many statisticians, including [[Nate Silver]], have argued that data science is not a new field, but rather another name for statistics.<ref>{{Cite web|url=https://www.statisticsviews.com/details/feature/5133141/Nate-Silver-What-I-need-from-statisticians.html|title=Nate Silver: What I need from statisticians - Statistics Views|website=www.statisticsviews.com|access-date=2020-04-03}}</ref> Others argue that data science is distinct from statistics because it focuses on problems and techniques unique to digital data.<ref>{{Cite web|url=http://priceonomics.com/whats-the-difference-between-data-science-and/|title=What's the Difference Between Data Science and Statistics?|website=Priceonomics|language=en|access-date=2020-04-03}}</ref> [[Vasant Dhar]] writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g. images) and emphasizes prediction and action.<ref>{{Cite journal|last=DharVasant|date=2013-12-01|title=Data science and prediction|journal=Communications of the ACM|volume=56|issue=12|pages=64–73|language=EN|doi=10.1145/2500499}}</ref> [[Andrew Gelman]] of Columbia University and data scientist Vincent Granville have described statistics as a nonessential part of data science.<ref>{{Cite web|url=https://statmodeling.stat.columbia.edu/2013/11/14/statistics-least-important-part-data-science/|title=Statistics is the least important part of data science « Statistical Modeling, Causal Inference, and Social Science|website=statmodeling.stat.columbia.edu|access-date=2020-04-03}}</ref><ref>{{Cite web|url=https://www.datasciencecentral.com/profiles/blogs/data-science-without-statistics-is-possible-even-desirable|title=Data science without statistics is possible, even desirable|last=Posted by Vincent Granville on December 8|first=2014 at 5:00pm|last2=Blog|first2=View|website=www.datasciencecentral.com|language=en|access-date=2020-04-03}}</ref>

Many statisticians, including Nate Silver, have argued that data science is not a new field, but rather another name for statistics. Others argue that data science is distinct from statistics because it focuses on problems and techniques unique to digital data. Vasant Dhar writes that statistics emphasizes quantitative data and description. In contrast, data science deals with quantitative and qualitative data (e.g. images) and emphasizes prediction and action. Andrew Gelman of Columbia University and data scientist Vincent Granville have described statistics as a nonessential part of data science.

包括纳特 · 西尔弗在内的许多统计学家都认为,数据科学不是一个新领域,而是统计学的另一个名称。其他人则认为数据科学不同于统计学,因为它专注于数字数据所特有的问题和技术。瓦桑特 · 达尔写道,统计学强调定量数据和描述。相比之下,数据科学研究的是定量和定性的数据。图片) ,并强调预测和行动。哥伦比亚大学的安德鲁 · 格尔曼和数据科学家文森特 · 格兰维尔将统计学描述为数据科学中不重要的部分。





Stanford professor [[David Donoho]] writes that data science is not distinguished from statistics by the size of datasets or use of computing, and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program. He describes data science as an applied field growing out of traditional statistics.<ref name=":7" />

Stanford professor David Donoho writes that data science is not distinguished from statistics by the size of datasets or use of computing, and that many graduate programs misleadingly advertise their analytics and statistics training as the essence of a data science program. He describes data science as an applied field growing out of traditional statistics.

斯坦福大学教授 David Donoho 写道,数据科学与统计学之间并不存在数据集的大小或计算机的使用,许多研究生课程错误地宣传他们的分析学和统计学训练是数据科学课程的本质。他把数据科学描述为从传统统计学中发展出来的一个应用领域。





== Etymology ==

== Etymology ==

词源学





=== Early usage ===

=== Early usage ===

早期使用





In 1962, [[John Tukey]] described a field he called “data analysis,” which resembles modern data science.<ref name=":7">{{Cite web|url=http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf|title=50 years of Data Science|last=Donoho|first=David|date=September 18, 2015|website=|url-status=live|archive-url=|archive-date=|access-date=April 2, 2020}}</ref> Later, attendees at a 1992 statistics symposium at the [[Montpellier 2 University|University of Montpellier II]] acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.<ref>{{Cite book|title=Data science and its applications = La @science des données et ses applications|date=1995|publisher=Academic Press/Harcourt Brace|others=Escoufier, Yves., Hayashi, Chikio (1918-....)., Fichet, Bernard.|year=1995|isbn=0-12-241770-4|location=Tokyo|pages=|oclc=489990740}}</ref><ref>{{Cite journal|last=Murtagh|first=Fionn|last2=Devlin|first2=Keith|date=2018|title=The Development of Data Science: Implications for Education, Employment, Research, and the Data Revolution for Sustainable Development|url=https://www.mdpi.com/2504-2289/2/2/14|journal=Big Data and Cognitive Computing|language=en|volume=2|issue=2|pages=14|doi=10.3390/bdcc2020014|via=|doi-access=free}}</ref>

In 1962, John Tukey described a field he called “data analysis,” which resembles modern data science. Later, attendees at a 1992 statistics symposium at the University of Montpellier II acknowledged the emergence of a new discipline focused on data of various origins and forms, combining established concepts and principles of statistics and data analysis with computing.

1962年,John Tukey 描述了一个他称之为“数据分析”的领域,类似于现代数据科学。后来,参加1992年第二届蒙彼利埃大学统计研讨会的与会者承认了一个新的学科的出现,这个学科专注于各种起源和形式的数据,将统计和数据分析的既定概念和原则与计算结合起来。





The term “data science” has been traced back to 1974, when [[Peter Naur]] proposed it as an alternative name for computer science.<ref name=":0">{{Cite journal|last=CaoLongbing|date=2017-06-29|title=Data Science|journal=ACM Computing Surveys (CSUR)|volume=50|issue=3|pages=1–42|language=EN|doi=10.1145/3076253|doi-access=free}}</ref> In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic.<ref name=":0" /> However, the definition was still in flux. In 1997, [[C.F. Jeff Wu]] suggested that statistics should be renamed data science. He reasoned that a new name would help statistics shed inaccurate stereotypes, such as being synonymous with accounting, or limited to describing data.<ref>{{Cite web|url=http://www2.isye.gatech.edu/~jeffwu/presentations/datascience.pdf|title=Statistics=Data Science?|last=Wu|first=C.F. Jeff|date=|website=|url-status=live|archive-url=|archive-date=|access-date=April 2, 2020}}</ref> In 1998, Chikio Hayashi argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.<ref>{{Cite journal|last=Murtagh|first=Fionn|last2=Devlin|first2=Keith|date=2018|title=The Development of Data Science: Implications for Education, Employment, Research, and the Data Revolution for Sustainable Development|url=https://www.mdpi.com/2504-2289/2/2/14|journal=Big Data and Cognitive Computing|language=en|volume=2|issue=2|pages=14|doi=10.3390/bdcc2020014|via=|doi-access=free}}</ref>

The term “data science” has been traced back to 1974, when Peter Naur proposed it as an alternative name for computer science. In 1996, the International Federation of Classification Societies became the first conference to specifically feature data science as a topic. In 1998, Chikio Hayashi argued for data science as a new, interdisciplinary concept, with three aspects: data design, collection, and analysis.

术语“数据科学”可以追溯到1974年,彼得 · 诺尔提出它作为计算机科学的替代名称。1996年,国际船级社联合会成为第一个以数据科学为专题的会议。1998年,林志雄主张数据科学是一个新的、跨学科的概念,包括数据设计、数据收集和数据分析三个方面。





During the 1990s, popular terms for the process of finding patterns in datasets (which were increasingly large) included “knowledge discovery” and “data mining.”<ref name=":1">{{Cite web|url=https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/|title=A Very Short History Of Data Science|last=Press|first=Gil|website=Forbes|language=en|access-date=2020-04-03}}</ref><ref name=":0" />

During the 1990s, popular terms for the process of finding patterns in datasets (which were increasingly large) included “knowledge discovery” and “data mining.”

在20世纪90年代,在数据集中寻找模式的流行术语(数据集越来越大)包括“知识发现”和“数据挖掘”





=== Modern usage ===

=== Modern usage ===

现代用法

The modern conception of data science as an independent discipline is sometimes attributed to [[William S. Cleveland]].<ref>{{Cite web|url=https://www.stat.purdue.edu/~wsc/|title=William S Cleveland|last=Gupta|first=Shanti|date=December 11, 2015|website=|url-status=live|archive-url=|archive-date=|access-date=April 2, 2020}}</ref> In a 2001 paper, he advocated an expansion of statistics beyond theory into technical areas; because this would significantly change the field, it warranted a new name.<ref name=":1" /> "Data science" became more widely used in the next few years: in 2002, the [[Committee on Data for Science and Technology]] launched ''Data Science Journal.'' In 2003, Columbia University launched ''The Journal of Data Science''.<ref name=":1" /> In 2014, the [[American Statistical Association]]'s Section on Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and Data Science, reflecting the ascendant popularity of data science.<ref>{{Cite news|last=Talley|first=Jill|url=https://magazine.amstat.org/blog/2016/06/01/datascience-2/|title=ASA Expands Scope, Outreach to Foster Growth, Collaboration in Data Science|date=June 1, 2016|work=Amstat News|access-date=|url-status=live|publisher=American Statistical Association}}</ref>

The modern conception of data science as an independent discipline is sometimes attributed to William S. Cleveland. In a 2001 paper, he advocated an expansion of statistics beyond theory into technical areas; because this would significantly change the field, it warranted a new name.

数据科学作为一门独立学科的现代概念,有时归功于威廉 · s · 克利夫兰。在2001年的一篇论文中,他主张将统计学从理论扩展到技术领域; 因为这将大大改变这个领域,它需要一个新的名称。





The professional title of “data scientist” has been attributed to [[DJ Patil]] and [[Jeff Hammerbacher]] in 2008.<ref>{{Cite news|last=Davenport|first=Thomas H.|url=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century|title=Data Scientist: The Sexiest Job of the 21st Century|date=2012-10-01|work=Harvard Business Review|access-date=2020-04-03|last2=Patil|first2=D. J.|issue=October 2012|issn=0017-8012}}</ref> Though it was used by the [[National Science Board]] in their 2005 report, "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century," it referred broadly to any key role in managing a digital data collection.<ref>{{Cite web|url=https://www.nsf.gov/pubs/2005/nsb0540/|title=US NSF - NSB-05-40, Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century|website=www.nsf.gov|access-date=2020-04-03}}</ref>

The professional title of “data scientist” has been attributed to DJ Patil and Jeff Hammerbacher in 2008. Though it was used by the National Science Board in their 2005 report, "Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century," it referred broadly to any key role in managing a digital data collection.

2008年,DJ 帕蒂尔和杰夫哈默巴赫尔被授予“数据科学家”的职称。尽管美国国家科学委员会(National Science Board)在其2005年的报告《长期数字数据收集: 21世纪的研究和教育成果》(Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century)中使用了这个词,但它广泛地提到了管理数字数。





There is still no consensus on the definition of data science and it is considered by some to be a buzzword.<ref>{{Cite web|url=https://www.forbes.com/sites/gilpress/2013/08/19/data-science-whats-the-half-life-of-a-buzzword/|title=Data Science: What's The Half-Life Of A Buzzword?|last=Press|first=Gil|website=Forbes|language=en|access-date=2020-04-03}}</ref>

There is still no consensus on the definition of data science and it is considered by some to be a buzzword.

对于数据科学的定义还没有达成共识,有些人认为这是一个流行词。





== Careers in data science ==

== Careers in data science ==

数据科学的职业

Data science is a growing field. A career as a data scientist is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019.<ref>{{Cite web|url=https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm|title=Best Jobs in America|website=Glassdoor|language=en|access-date=2020-04-03}}</ref> Data scientists have a median salary of $118,370 per year or $56.91 per hour.<ref name=":2">{{Cite web|url=https://www.bls.gov/ooh/computer-and-information-technology/computer-and-information-research-scientists.htm|title=Computer and Information Research Scientists : Occupational Outlook Handbook: : U.S. Bureau of Labor Statistics|website=www.bls.gov|language=en-us|access-date=2020-04-03}}</ref> Job growth in this field is also above average, with a projected increase of 16% from 2018 to 2028.<ref name=":2" /> The largest employer of data scientists in the US is the federal government, employing 28% of the data science workforce.<ref name=":2" /> Other large employers of data scientists are computer system design services, research and development laboratories, and colleges and universities.<ref name=":2" /> Typically, data scientists work full time, and some work more than 40 hours a week.<ref name=":2" />

Data science is a growing field. A career as a data scientist is ranked at the third best job in America for 2020 by Glassdoor, and was ranked the number one best job from 2016-2019. Data scientists have a median salary of $118,370 per year or $56.91 per hour. Job growth in this field is also above average, with a projected increase of 16% from 2018 to 2028. The largest employer of data scientists in the US is the federal government, employing 28% of the data science workforce. Other large employers of data scientists are computer system design services, research and development laboratories, and colleges and universities. Typically, data scientists work full time, and some work more than 40 hours a week.

数据科学是一个不断发展的领域。数据科学家的职业被 Glassdoor 评为2020年美国最佳工作的第三名,并被评为2016-2019年最佳工作的第一名。数据科学家的平均工资是每年118,370美元或每小时56.91美元。该领域的就业增长也高于平均水平,预计从2018年到2028年将增长16% 。美国数据科学家的最大雇主是联邦政府,雇佣了28% 的数据科学工作人员。数据科学家的其他大型雇主有计算机系统设计服务、研究和开发实验室以及学院和大学。通常情况下,数据科学家全职工作,有些人每周工作超过40小时。





=== Educational path ===

=== Educational path ===

教育途径

In order to become a data scientist, there is a significant amount of education and experience required. The first step in becoming a data scientist is to earn a bachelor's degree, typically in a field related to computing or mathematics.<ref name=":3">{{Cite web|url=https://www.mastersindatascience.org/careers/data-scientist/|title=What is a Data Scientist?|website=Master's in Data Science|language=en-US|access-date=2020-04-03}}</ref><ref name=":2" /> Coding bootcamps are also available and can be used as an alternate pre-qualification to supplement a bachelor's degree in another field.<ref name=":3" /> Most data scientists also complete a master’s degree or a PhD in data science.<ref name=":3" /> Once these qualifications are met, the next step to becoming a data scientist is to apply for an entry level job in the field.<ref name=":3" /> Some data scientists may later choose to specialize in a sub-field of data science.<ref name=":3" />

In order to become a data scientist, there is a significant amount of education and experience required. The first step in becoming a data scientist is to earn a bachelor's degree, typically in a field related to computing or mathematics. Coding bootcamps are also available and can be used as an alternate pre-qualification to supplement a bachelor's degree in another field. Most data scientists also complete a master’s degree or a PhD in data science. Once these qualifications are met, the next step to becoming a data scientist is to apply for an entry level job in the field. Some data scientists may later choose to specialize in a sub-field of data science.

要成为一名数据科学家,需要大量的教育和经验。成为数据科学家的第一步是获得学士学位,通常是在与计算或数学相关的领域。编程训练营也是可用的,可以作为其他领域的学士学位的补充资格预审。大多数数据科学家还完成了数据科学的硕士学位或博士学位。一旦符合这些条件,成为数据科学家的下一步就是申请该领域的入门级工作。一些数据科学家以后可能会选择专攻数据科学的一个分支领域。





=== Specializations and associated careers ===

=== Specializations and associated careers ===

专业化和相关职业





* Machine Learning Scientist: Machine learning scientists research new methods of data analysis and create algorithms.<ref name=":4">{{Cite web|url=https://www.northeastern.edu/graduate/blog/data-science-careers-shaping-our-future/|title=11 Data Science Careers Shaping the Future|date=2018-11-23|website=Northeastern University Graduate Programs|language=en-US|access-date=2020-04-03}}</ref>



* Data Analyst: Data analysts utilize large data sets to gather information that meets their company’s needs.<ref name=":4" />



* Data Consultant: Data consultants work with businesses to determine the best usage of the information yielded from data analysis.<ref name=":3" />



* Data Architect: Data architects build data solutions that are optimized for performance and design applications.<ref name=":4" />



* Applications Architect: Applications architects track how applications are used throughout a business and how they interact with users and other applications.<ref name=":4" />







== Impacts of data science ==

== Impacts of data science ==

数据科学的影响

Big data is very quickly becoming a vital tool for businesses and companies of all sizes.<ref name=":5">{{Cite web|url=https://www.forbes.com/sites/peterpham/2015/08/28/the-impacts-of-big-data-that-you-may-not-have-heard-of/|title=The Impacts Of Big Data That You May Not Have Heard Of|last=Pham|first=Peter|website=Forbes|language=en|access-date=2020-04-03}}</ref> The availability and interpretation of big data has altered the business models of old industries and enabled the creation of new ones.<ref name=":5" /> Data-driven businesses are worth $1.2 trillion collectively in 2020, an increase from $333 billion in the year 2015.<ref name=":6">{{Cite web|url=https://towardsdatascience.com/how-data-science-will-impact-future-of-businesses-7f11f5699c4d|title=How Data Science will Impact Future of Businesses?|last=Martin|first=Sophia|date=2019-09-20|website=Medium|language=en|access-date=2020-04-03}}</ref> Data scientists are responsible for breaking down big data into usable information and creating software and algorithms that help companies and organizations determine optimal operations.<ref name=":6" /> As big data continues to have a major impact on the world, data science does as well due to the close relationship between the two.<ref name=":6" />

Big data is very quickly becoming a vital tool for businesses and companies of all sizes. The availability and interpretation of big data has altered the business models of old industries and enabled the creation of new ones. Data scientists are responsible for breaking down big data into usable information and creating software and algorithms that help companies and organizations determine optimal operations. As big data continues to have a major impact on the world, data science does as well due to the close relationship between the two.

大数据正迅速成为各种规模的企业和公司的重要工具。大数据的可用性和解释改变了旧行业的商业模式,并促成了新行业的创建。数据科学家负责将大数据分解成可用的信息,并创建软件和算法,帮助企业和组织确定最佳操作。随着大数据继续对世界产生重大影响,数据科学也由于两者之间的密切关系而产生重大影响。





== Technologies and techniques ==

== Technologies and techniques ==

技术和技术

There are a variety of different technologies and techniques that are used for data science which depending on the application.

There are a variety of different technologies and techniques that are used for data science which depending on the application.

有各种不同的技术和技术用于数据科学,这取决于应用。





=== Techniques ===

=== Techniques ===

=== Techniques ===





* [[Cluster analysis|Clustering]] is a technique used to group data together.



* [[Dimensionality reduction]] is used to reduce the complexity of data computation so that it can be performed more quickly.



* [[Machine learning]] is a technique used to perform tasks by inferencing patterns from data.







=== Technologies ===

=== Technologies ===

技术





* [[Python (programming language)|Python]] is a programming language with simple syntax that is commonly used for data science.<ref>{{Cite web|url=https://sites.engineering.ucsb.edu/~shell/che210d/python.pdf|title=An introduction to Python for scientific computing|last=Shell|first=M Scott|date=September 24, 2019|website=|url-status=live|archive-url=|archive-date=|access-date=April 2, 2020}}</ref> There are a number of python libraries that are used in data science including numpy, pandas, and scipy.



* [[R (programming language)|R]] is a programming language that was designed for statisticians and data mining<ref>{{Cite web|url=https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_003f|title=R FAQ|website=cran.r-project.org|access-date=2020-04-03}}</ref> and is optimized for computation.



* [[TensorFlow]] is a framework for creating machine learning models developed by Google.



* [[Pytorch]] is another framework for machine learning developed by Facebook.



* [[Jupyter Notebook]] is an interactive web interface for Python that allows faster experimentation.



* [[Tableau Software|Tableau]] makes a variety of software that is used for data visualization<ref>{{Cite journal|url=https://www.wired.com/2014/07/a-drag-and-drop-toolkit-that-lets-anyone-create-interactive-maps/|journal=Wired|access-date=2020-04-03|title=A Dead-Simple Tool That Lets Anyone Create Interactive Maps|date=15 July 2014|last1=Rhodes|first1=Margaret}}</ref>.



* [[Apache Hadoop]] is a software framework that is used to process data over large distributed systems.







==References==

==References==

参考资料

{{Reflist|35em}}







[[Category:Information science]]

Category:Information science

类别: 信息科学

[[Category:Computer occupations]]

Category:Computer occupations

类别: 计算机职业

[[Category:Computational fields of study]]

Category:Computational fields of study

类别: 研究的计算领域

[[Category:Data analysis]]

Category:Data analysis

类别: 数据分析

<noinclude>

<small>This page was moved from [[wikipedia:en:Data science]]. Its edit history can be viewed at [[数据科学/edithistory]]</small></noinclude>

[[Category:待整理页面]]
1,568

个编辑

导航菜单