更改

添加158,635字节、 2020年5月12日 (二) 17:54

Moved page from wikipedia:en:Machine learning (history)

此词条暂由彩云小译翻译，未经人工整理和审校，带来阅读不便，请见谅。

{{for|the journal|Machine Learning (journal)}}

{{redirect|Statistical learning|statistical learning in linguistics|statistical learning in language acquisition}}

{{short description|Scientific study of algorithms and statistical models that computer systems use to perform tasks without explicit instructions}}

{{Machine learning bar}}

'''Machine learning''' ('''ML''') is the study of computer algorithms that improve automatically through experience.<ref>http://www.cs.cmu.edu/~tom/mlbook.html</ref> It is seen as a subset of [[artificial intelligence]]. Machine learning algorithms build a [[mathematical model]] based on sample data, known as "[[training data]]", in order to make predictions or decisions without being explicitly programmed to do so.{{refn|The definition "without being explicitly programmed" is often attributed to [[Arthur Samuel]], who coined the term "machine learning" in 1959, but the phrase is not found verbatim in this publication, and may be a [[paraphrase]] that appeared later. Confer "Paraphrasing Arthur Samuel (1959), the question is: How can computers learn to solve problems without being explicitly programmed?" in {{Cite conference|title=Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming|conference=Artificial Intelligence in Design '96|last=Koza|first=John R.|last2=Bennett|first2=Forrest H.|last3=Andre|first3=David|last4=Keane|first4=Martin A.|date=1996|publisher=Springer, Dordrecht|pages=151–170|language=en|doi=10.1007/978-94-009-0279-4_9}}}}<ref name="bishop2006" />{{rp|2}} Machine learning algorithms are used in a wide variety of applications, such as [[email filtering]] and [[computer vision]], where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so.}} Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks.

机器学习(ML)是研究通过经验自动改进的计算机算法。它被看作是人工智能的一个子集。机器学习算法建立一个基于样本数据的数学模型，称为“训练数据” ，以便在没有明确编程的情况下进行预测或决策。}机器学习算法被广泛应用于各种各样的应用中，如电子邮件过滤和计算机视觉，在这些应用中，开发传统的算法来执行所需的任务是困难的或不可行的。

Machine learning is closely related to [[computational statistics]], which focuses on making predictions using computers. The study of [[mathematical optimization]] delivers methods, theory and application domains to the field of machine learning. [[Data mining]] is a related field of study, focusing on [[exploratory data analysis]] through [[unsupervised learning]].{{refn|Machine learning and pattern recognition "can be viewed as two facets of the same field."<ref name="bishop2006" />{{rp|vii}}}}<ref>{{cite journal |last=Friedman |first=Jerome H. |authorlink = Jerome H. Friedman|title=Data Mining and Statistics: What's the connection? |journal=Computing Science and Statistics |volume=29 |issue=1 |year=1998 |pages=3–9}}</ref> In its application across business problems, machine learning is also referred to as [[predictive analytics]].

Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a related field of study, focusing on exploratory data analysis through unsupervised learning.{{refn|Machine learning and pattern recognition "can be viewed as two facets of the same field." In its application across business problems, machine learning is also referred to as predictive analytics.

机器学习与计算机数据学习密切相关，后者专注于利用计算机进行预测。最优化的研究为机器学习领域提供了方法、理论和应用领域。数据挖掘是一个相关的研究领域，从探索性数据分析到非监督式学习。{{ refn | 机器学习和模式识别“可以看作是同一领域的两个方面。”在应用于商业问题时，机器学习也被称为预测分析学习。

== Overview ==

Machine learning involves computers discovering how they can perform tasks without being explicitly programmed to do so. For simple tasks assigned to computers, it is possible to program algorithms telling the machine how to execute all steps required to solve the problem at hand; on the computer's part, no learning is needed. For more advanced tasks, it can be challenging for a human to manually create the needed algorithms. In practice, it can turn out to be more effective to help the machine develop its own algorithm, rather than have human programmers specify every needed step.<ref name = "Alpaydin2020">{{cite book

Machine learning involves computers discovering how they can perform tasks without being explicitly programmed to do so. For simple tasks assigned to computers, it is possible to program algorithms telling the machine how to execute all steps required to solve the problem at hand; on the computer's part, no learning is needed. For more advanced tasks, it can be challenging for a human to manually create the needed algorithms. In practice, it can turn out to be more effective to help the machine develop its own algorithm, rather than have human programmers specify every needed step.<ref name = "Alpaydin2020">{{cite book

机器学习涉及到计算机发现它们如何在没有明确编程的情况下执行任务。对于分配给计算机的简单任务，可以编写算法程序，告诉机器如何执行解决手头问题所需的所有步骤; 对于计算机来说，不需要学习。对于更高级的任务，手动创建所需的算法对人来说是一个挑战。实际上，帮助机器开发自己的算法比让人工程序员指定所需的每个步骤更有效。 2020"{ cite book"

| author = Ethem Alpaydin

| author = Ethem Alpaydin

作者: Ethem Alpaydin

| title =Introduction to Machine Learning

| title =Introduction to Machine Learning

机器学习入门

| year = 2020

| year = 2020

2020年

| edition = Fourth

| edition = Fourth

第四版

| pages = xix, 1-3, 13-18

| pages = xix, 1-3, 13-18

第十九页1-313-18页

| publisher = [[MIT]]

| publisher = MIT

出版商: 麻省理工学院

|ISBN = 0262043793

|ISBN = 0262043793

0262043793

}}</ref><ref name="elements"/>

}}</ref>

{} / ref

The discipline of machine learning employs various approaches to help computers learn to accomplish tasks where no fully satisfactory algorithm is available. In cases where vast numbers of potential answers exist, one approach is to label some of the correct answers as valid. This can then be used as training data for the computer to improve the algorithm(s) it uses to determine correct answers. For example, to train a system for the task of digital character recognition, the [[MNIST database|MNIST]] dataset has often been used. <ref name = "Alpaydin2020"/><ref name="elements"/>

The discipline of machine learning employs various approaches to help computers learn to accomplish tasks where no fully satisfactory algorithm is available. In cases where vast numbers of potential answers exist, one approach is to label some of the correct answers as valid. This can then be used as training data for the computer to improve the algorithm(s) it uses to determine correct answers. For example, to train a system for the task of digital character recognition, the MNIST dataset has often been used.

机器学习的学科使用各种方法来帮助计算机学习完成任务，而这些任务没有完全令人满意的算法可用。在存在大量潜在答案的情况下，一种方法是将一些正确答案标记为有效。这样就可以作为计算机的训练数据来改进它用来确定正确答案的算法。例如，为了训练一个系统来完成数字字符识别的任务，常常使用 MNIST 数据集。

=== Machine learning approaches ===

{{Anchor|Algorithm types}}

Early classifications for machine learning approaches sometimes divided them into three broad categories, depending on the nature of the "signal" or "feedback" available to the learning system. These were:

Early classifications for machine learning approaches sometimes divided them into three broad categories, depending on the nature of the "signal" or "feedback" available to the learning system. These were:

机器学习方法的早期分类有时根据学习系统可用的“信号”或“反馈”的性质将它们分为三大类。这些是:

 [[Supervised learning]]: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that [[Map (mathematics)|maps]] inputs to outputs.

 Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.

Br / 监督式学习: 计算机由一个“老师”给出一个输入和他们想要的输出的例子，目的是学习一个将输入映射到输出的一般规则。

 [[Unsupervised learning]]: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end ([[feature learning]]).

 Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).

Br / 非监督式学习: 没有给学习算法贴标签，让它自己在输入中寻找结构。非监督式学习本身可以是一个目标(在数据中发现隐藏的模式) ，也可以是一个达到目标的手段(特性学习)。

 [[Reinforcement learning]]: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as [[Autonomous car|driving a vehicle]] or playing a game against an opponent) As it navigates its problem space, the program is provided feedback that's analogous to rewards, which it tries to maximise. <ref name="bishop2006"/>

 Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle or playing a game against an opponent) As it navigates its problem space, the program is provided feedback that's analogous to rewards, which it tries to maximise.

Br / 强化学习: 一个计算机程序与一个动态环境相互作用，在这个环境中它必须执行一个特定的目标(比如驾驶一辆车或者与对手玩一个游戏)当它导航它的问题空间时，程序被提供类似于奖励的反馈，它试图将奖励最大化。

Other approaches or processes have since developed that don't fit neatly into this three-fold categorisation, and sometimes more than one is used by the same machine learning system. For example [[topic modeling]], [[dimensionality reduction]] or [[Meta learning (computer science)|meta learning]]. <ref>{{cite book

Other approaches or processes have since developed that don't fit neatly into this three-fold categorisation, and sometimes more than one is used by the same machine learning system. For example topic modeling, dimensionality reduction or meta learning. <ref>{{cite book

其他的方法或过程已经开发出来，不能完全符合这三重分类，有时候同一个机器学习系统使用了不止一种方法。例如，话题建模，降维学习或者元学习。文档{ cite book

| authors = Pavel Brazdil, Christophe Giraud Carrier, Carlos Soares, Ricardo Vilalta

| authors = Pavel Brazdil, Christophe Giraud Carrier, Carlos Soares, Ricardo Vilalta

作者: 帕维尔 · 布拉迪尔，克里斯托弗 · 吉拉德 · 卡里尔，卡洛斯 · 苏亚雷斯，里卡多 · 维拉尔塔

| title =Metalearning: Applications to Data Mining

| title =Metalearning: Applications to Data Mining

数据挖掘的应用

| year = 2009

| year = 2009

2009年

| edition = Fourth

| edition = Fourth

第四版

| pages = 10-14, ''passim''

| pages = 10-14, passim

| pages = 10-14, passim

| publisher = [[Springer Science+Business Media]]

| publisher = Springer Science+Business Media

出版商 Springer Science + Business Media

|ISBN = 3540732624

|ISBN = 3540732624

3540732624

}}</ref> As of 2020, [[deep learning]] has become the dominant approach for much ongoing work in the field of machine learning . <ref name = "Alpaydin2020"/>

}}</ref> As of 2020, deep learning has become the dominant approach for much ongoing work in the field of machine learning .

} / ref 截至2020年，深度学习已经成为机器学习领域许多正在进行的工作的主要方法。

== History and relationships to other fields ==

{{see also|Timeline of machine learning}}

The term ''machine learning'' was coined in 1959 by [[Arthur Samuel]], an American [[IBMer]] and pioneer in the field of [[computer gaming]] and [[artificial intelligence]]. <ref name="Samuel">{{Cite journal|last=Samuel|first=Arthur|date=1959|title=Some Studies in Machine Learning Using the Game of Checkers|journal=IBM Journal of Research and Development|volume=3|issue=3|pages=210–229|doi=10.1147/rd.33.0210|citeseerx=10.1.1.368.2254}}</ref><ref>R. Kohavi and F. Provost, "Glossary of terms," Machine Learning, vol. 30, no. 2–3, pp. 271–274, 1998.</ref> A representative book of the machine learning research during the 1960s was the Nilsson's book on Learning Machines, dealing mostly with machine learning for pattern classification.<ref> Nilsson N. Learning Machines, McGraw Hill, 1965. </ref> Interest related to pattern recognition continued into the 1970s, as described by Duda and Hart in 1973. <ref> Duda, R., Hart P. Pattern Recognition and Scene Analysis, Wiley Interscience, 1973 </ref> In 1981 a report was given on using teaching strategies so that a neural network learns to recognize 40 characters (26 letters, 10 digits, and 4 special symbols) from a computer terminal. <ref> S. Bozinovski "Teaching space: A representation concept for adaptive pattern classification" COINS Technical Report No. 81-28, Computer and Information Science Department, University of Massachusetts at Amherst, MA, 1981. https://web.cs.umass.edu/publication/docs/1981/UM-CS-1981-028.pdf </ref>

The term machine learning was coined in 1959 by Arthur Samuel, an American IBMer and pioneer in the field of computer gaming and artificial intelligence. A representative book of the machine learning research during the 1960s was the Nilsson's book on Learning Machines, dealing mostly with machine learning for pattern classification. Interest related to pattern recognition continued into the 1970s, as described by Duda and Hart in 1973. In 1981 a report was given on using teaching strategies so that a neural network learns to recognize 40 characters (26 letters, 10 digits, and 4 special symbols) from a computer terminal.

机器学习这个术语是1959年由美国 IBMer 创造的，他是计算机游戏和人工智能领域的先驱。20世纪60年代机器学习研究的一本代表性书籍是尼尔森的《学习机器》，主要是关于模式分类的机器学习。正如 Duda 和 Hart 在1973年所描述的那样，与模式识别相关的兴趣一直持续到20世纪70年代。1981年，一份关于使用教学策略使神经网络从计算机终端学习识别40个字符(26个字母、10个数字和4个特殊符号)的报告发表。

[[Tom M. Mitchell]] provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience ''E'' with respect to some class of tasks ''T'' and performance measure ''P'' if its performance at tasks in ''T'', as measured by ''P'', improves with experience ''E''."<ref name="Mitchell-1997">{{cite book

Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."<ref name="Mitchell-1997">{{cite book

汤姆 · 米切尔对机器学习领域研究的算法提供了一个被广泛引用的、更为正式的定义: “据说计算机程序在某类任务 t 和性能测量 p 方面从经验 e 中学习，如果它在 t 任务中的性能，按 p 衡量，随着经验 e 的改进而得到改进的话。” ref name"Mitchell-1997"{ cite book

|author=Mitchell, T.

|author=Mitchell, T.

作者: 米切尔。

|title=Machine Learning

|title=Machine Learning

机器学习

|publisher=McGraw Hill

|publisher=McGraw Hill

出版商麦格劳希尔

|isbn= 978-0-07-042807-2

|isbn= 978-0-07-042807-2

[国际标准图书编号978-0-07-042807-2]

|pages=2

|pages=2

第二页

|year=1997}}</ref> This definition of the tasks in which machine learning is concerned offers a fundamentally [[operational definition]] rather than defining the field in cognitive terms. This follows [[Alan Turing]]'s proposal in his paper "[[Computing Machinery and Intelligence]]", in which the question "Can machines think?" is replaced with the question "Can machines do what we (as thinking entities) can do?".<ref>{{Citation |chapterurl=http://eprints.ecs.soton.ac.uk/12954/ |first=Stevan |last=Harnad |authorlink=Stevan Harnad |year=2008 |chapter=The Annotation Game: On Turing (1950) on Computing, Machinery, and Intelligence |editor1-last=Epstein |editor1-first=Robert |editor2-last=Peters |editor2-first=Grace |title=The Turing Test Sourcebook: Philosophical and Methodological Issues in the Quest for the Thinking Computer |pages=23–66 |location= |publisher=Kluwer |isbn= 9781402067082}}</ref>

|year=1997}}</ref> This definition of the tasks in which machine learning is concerned offers a fundamentally operational definition rather than defining the field in cognitive terms. This follows Alan Turing's proposal in his paper "Computing Machinery and Intelligence", in which the question "Can machines think?" is replaced with the question "Can machines do what we (as thinking entities) can do?".

这个关于机器学习任务的定义提供了一个基本的操作型定义，而不是用认知的术语来定义这个领域。此前，阿兰 · 图灵在他的论文《计算机器与智能》中提出了“机器能思考吗? ”取而代之的是“机器能做我们(作为思考实体)能做的事情吗? ”？".

=== Relation to artificial intelligence ===

As a scientific endeavor, machine learning grew out of the quest for artificial intelligence. In the early days of AI as an [[Discipline (academia)|academic discipline]], some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what were then termed "[[neural network]]s"; these were mostly [[perceptron]]s and [[ADALINE|other models]] that were later found to be reinventions of the [[generalized linear model]]s of statistics.<ref>{{cite citeseerx |last1=Sarle |first1=Warren |title=Neural Networks and statistical models |citeseerx=10.1.1.27.699 |year=1994}}</ref> [[Probability theory|Probabilistic]] reasoning was also employed, especially in automated [[medical diagnosis]].<ref name="aima">{{cite AIMA|edition=2}}</ref>{{rp|488}}

As a scientific endeavor, machine learning grew out of the quest for artificial intelligence. In the early days of AI as an academic discipline, some researchers were interested in having machines learn from data. They attempted to approach the problem with various symbolic methods, as well as what were then termed "neural networks"; these were mostly perceptrons and other models that were later found to be reinventions of the generalized linear models of statistics. Probabilistic reasoning was also employed, especially in automated medical diagnosis.

作为一项科学努力，机器学习源于对人工智能的探索。在人工智能作为一门学科的早期，一些研究人员对让机器从数据中学习很感兴趣。他们试图用各种符号方法以及当时被称为”神经网络”的方法来处理这个问题; 这些大部分是感知器和其他模型，后来发现这些模型是统计学的广义线性模型的重新发明。概率推理也被使用，特别是在自动医疗诊断中。

However, an increasing emphasis on the [[GOFAI|logical, knowledge-based approach]] caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation.<ref name="aima" />{{rp|488}} By 1980, [[expert system]]s had come to dominate AI, and statistics was out of favor.<ref name="changing">{{Cite journal | last1 = Langley | first1 = Pat| title = The changing science of machine learning | doi = 10.1007/s10994-011-5242-y | journal = [[Machine Learning (journal)|Machine Learning]]| volume = 82 | issue = 3 | pages = 275–279 | year = 2011 | pmid = | pmc = | doi-access = free }}</ref> Work on symbolic/knowledge-based learning did continue within AI, leading to [[inductive logic programming]], but the more statistical line of research was now outside the field of AI proper, in [[pattern recognition]] and [[information retrieval]].<ref name="aima" />{{rp|708–710; 755}} Neural networks research had been abandoned by AI and [[computer science]] around the same time. This line, too, was continued outside the AI/CS field, as "[[connectionism]]", by researchers from other disciplines including [[John Hopfield|Hopfield]], [[David Rumelhart|Rumelhart]] and [[Geoff Hinton|Hinton]]. Their main success came in the mid-1980s with the reinvention of [[backpropagation]].<ref name="aima" />{{rp|25}}

However, an increasing emphasis on the logical, knowledge-based approach caused a rift between AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems of data acquisition and representation. Work on symbolic/knowledge-based learning did continue within AI, leading to inductive logic programming, but the more statistical line of research was now outside the field of AI proper, in pattern recognition and information retrieval. Neural networks research had been abandoned by AI and computer science around the same time. This line, too, was continued outside the AI/CS field, as "connectionism", by researchers from other disciplines including Hopfield, Rumelhart and Hinton. Their main success came in the mid-1980s with the reinvention of backpropagation.

然而，日益强调的基于知识的逻辑方法导致了人工智能和机器学习之间的裂痕。概率系统一直被数据获取和表示的理论和实际问题所困扰。基于符号 / 知识学习的工作在人工智能中继续进行，导致了归纳逻辑编程，但是更多的统计方面的研究现在已经超出了人工智能本身的领域，在模式识别和信息检索领域。神经网络的研究几乎在同一时间被人工智能和计算机科学所抛弃。这一思路也延续到了人工智能 / 计算机科学领域之外，被来自霍普菲尔德、鲁梅尔哈特和辛顿等其他学科的研究人员称为“连接主义”。他们的主要成功来自于20世纪80年代中期反向传播的重新发明。

Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the [[symbolic artificial intelligence|symbolic approaches]] it had inherited from AI, and toward methods and models borrowed from statistics and [[probability theory]].<ref name="changing" /> As of 2019, many sources continue to assert that machine learning remains a sub field of AI. Yet some practitioners, for example Dr [[Daniel J. Hulme|Daniel Hulme]], who both teaches AI and runs a company operating in the field, argues that machine learning and AI are separate. <ref name="elements">

Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory. As of 2019, many sources continue to assert that machine learning remains a sub field of AI. Yet some practitioners, for example Dr Daniel Hulme, who both teaches AI and runs a company operating in the field, argues that machine learning and AI are separate. <ref name="elements">

机器学习，重组为一个独立的领域，在20世纪90年代开始蓬勃发展。该领域的目标从实现人工智能转变为解决实际性的可解决问题。它将焦点从继承自人工智能的象征性方法转移到借用统计学和概率论的方法和模型上。截至2019年，许多资料继续断言机器学习仍然是人工智能的一个子领域。然而，一些从业者，例如丹尼尔 · 休姆博士，他既教授人工智能，又经营着一家在该领域运营的公司，认为机器学习和人工智能是分开的。Ref name"elements"

{{cite web

{{cite web

{ cite web

|url= https://course.elementsofai.com/

|url= https://course.elementsofai.com/

Https://course.elementsofai.com/

|title= The Elements of AI

|title= The Elements of AI

人工智能的元素

|publisher= [[University of Helsinki]]

|publisher= University of Helsinki

出版商赫尔辛基大学

|date = Dec 2019

|date = Dec 2019

2019年12月

|accessdate=7 April 2020}}

|accessdate=7 April 2020}}

2020年4月7日}

</ref><ref>

</ref><ref>

/ ref

{{cite web

{{cite web

{ cite web

|url= https://www.techworld.com/tech-innovation/satalia-ceo-no-one-is-doing-ai-optimisation-can-change-that-3775689/

|url= https://www.techworld.com/tech-innovation/satalia-ceo-no-one-is-doing-ai-optimisation-can-change-that-3775689/

Https://www.techworld.com/tech-innovation/satalia-ceo-no-one-is-doing-ai-optimisation-can-change-that-3775689/

|title= Satalia CEO Daniel Hulme has a plan to overcome the limitations of machine learning

|title= Satalia CEO Daniel Hulme has a plan to overcome the limitations of machine learning

萨塔利亚公司首席执行官丹尼尔 · 休姆计划克服机器学习的局限性

|publisher= [[Techworld]]

|publisher= Techworld

出版商 Techworld

|date = October 2019

|date = October 2019

2019年10月

|accessdate=7 April 2020}}

|accessdate=7 April 2020}}

2020年4月7日}

</ref><ref name = "Alpaydin2020"/>

</ref>

/ 参考

=== Relation to data mining ===

Machine learning and [[data mining]] often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on ''known'' properties learned from the training data, [[data mining]] focuses on the [[discovery (observation)|discovery]] of (previously) ''unknown'' properties in the data (this is the analysis step of [[knowledge discovery]] in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, [[ECML PKDD]] being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to ''reproduce known'' knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously ''unknown'' knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

Machine learning and data mining often employ the same methods and overlap significantly, but while machine learning focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the analysis step of knowledge discovery in databases). Data mining uses many machine learning methods, but with different goals; on the other hand, machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data.

机器学习和数据挖掘通常使用相同的方法并且有很大的重叠，但是机器学习的重点是预测，基于从训练数据中学到的已知属性，数据挖掘的重点是发现数据中(以前)未知的属性(这是数据库中知识发现的分析步骤)。数据挖掘使用了许多机器学习方法，但目标不同; 另一方面，机器学习也使用数据挖掘方法作为“非监督式学习”或作为提高学习者准确性的预处理步骤。这两个研究团体之间的混淆(这两个团体通常有单独的会议和单独的期刊，ECML PKDD 是一个主要的例外)来自他们工作的基本假设: 在机器学习中，性能通常是根据再现已知知识的能力来评估，而在知识发现和数据挖掘(KDD)中，关键任务是发现以前未知的知识。对已知知识进行评价时，其他监督方法很容易超过未知(无监督)方法，而在典型的知识发现任务中，由于缺乏训练数据，无法使用监督方法。

=== Relation to optimization ===

Machine learning also has intimate ties to [[Mathematical optimization|optimization]]: many learning problems are formulated as minimization of some [[loss function]] on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples). The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.<ref>{{cite encyclopedia |last1=Le Roux |first1=Nicolas |first2=Yoshua |last2=Bengio |first3=Andrew |last3=Fitzgibbon |title=Improving First and Second-Order Methods by Modeling Uncertainty |encyclopedia=Optimization for Machine Learning |year=2012 |page=404 |editor1-last=Sra |editor1-first=Suvrit |editor2-first=Sebastian |editor2-last=Nowozin |editor3-first=Stephen J. |editor3-last=Wright |publisher=MIT Press|url=https://books.google.com/?id=JPQx7s2L1A8C&pg=PA403&dq="Improving+First+and+Second-Order+Methods+by+Modeling+Uncertainty|isbn=9780262016469 }}</ref>

Machine learning also has intimate ties to optimization: many learning problems are formulated as minimization of some loss function on a training set of examples. Loss functions express the discrepancy between the predictions of the model being trained and the actual problem instances (for example, in classification, one wants to assign a label to instances, and models are trained to correctly predict the pre-assigned labels of a set of examples). The difference between the two fields arises from the goal of generalization: while optimization algorithms can minimize the loss on a training set, machine learning is concerned with minimizing the loss on unseen samples.

机器学习与优化也有着密切的联系: 许多学习问题被表述为最小化训练样本集上的某些损失函数。损失函数表示正在训练的模型的预测与实际问题实例之间的差异(例如，在分类中，人们希望为实例分配一个标签，而模型则被训练以正确预测一组实例的预先分配的标签)。这两个领域之间的差异源于泛化的目标: 优化算法可以最小化训练集上的损失，而机器学习关注于最小化未知样本上的损失。

=== Relation to statistics ===

Machine learning and [[statistics]] are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population [[Statistical inference|inferences]] from a [[Sample (statistics)|sample]], while machine learning finds generalizable predictive patterns.<ref>{{cite journal |first=Danilo |last=Bzdok |first2=Naomi |last2=Altman |authorlink2=Naomi Altman |first3=Martin |last3=Krzywinski |title=Statistics versus Machine Learning |journal=[[Nature Methods]] |volume=15 |issue=4 |pages=233–234 |year=2018 |doi=10.1038/nmeth.4642 |pmid=30100822 |pmc=6082636 }}</ref> According to [[Michael I. Jordan]], the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.<ref name="mi jordan ama">{{cite web|url=https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt?context=3 |title=statistics and machine learning|publisher=reddit|date=2014-09-10|accessdate=2014-10-01|language=|author = Michael I. Jordan|author-link=Michael I. Jordan}}</ref> He also suggested the term [[data science]] as a placeholder to call the overall field.<ref name="mi jordan ama" />

Machine learning and statistics are closely related fields in terms of methods, but distinct in their principal goal: statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns. According to Michael I. Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field.

就方法而言，机器学习和统计学是密切相关的领域，但它们的主要目标是不同的: 统计学从样本中得出总体推论，而机器学习则找到可概括的预测模式。根据迈克尔 · 乔丹的观点，机器学习的思想，从方法论原则到理论工具，在统计学中已经有很长的历史了。他还建议用数据科学这个词作为整个领域的占位符。

[[Leo Breiman]] distinguished two statistical modeling paradigms: data model and algorithmic model,<ref>{{cite web|url=http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726|title=Breiman: Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)|author=Cornell University Library|accessdate=8 August 2015}}</ref> wherein "algorithmic model" means more or less the machine learning algorithms like [[Random forest]].

Leo Breiman distinguished two statistical modeling paradigms: data model and algorithmic model, wherein "algorithmic model" means more or less the machine learning algorithms like Random forest.

Leo Breiman 区分了两种统计建模范式: 数据模型和算法模型，其中“算法模型”或多或少意味着像随机森林这样的机器学习算法。

Some statisticians have adopted methods from machine learning, leading to a combined field that they call ''statistical learning''.<ref name="islr">{{cite book |author1=Gareth James |author2=Daniela Witten |author3=Trevor Hastie |author4=Robert Tibshirani |title=An Introduction to Statistical Learning |publisher=Springer |year=2013 |url=http://www-bcf.usc.edu/~gareth/ISL/ |page=vii}}</ref>

Some statisticians have adopted methods from machine learning, leading to a combined field that they call statistical learning.

一些统计学家采用了机器学习的方法，形成了一个他们称之为统计学习的综合领域。

== {{anchor|Generalization}} Theory ==

{{Main|Computational learning theory|Statistical learning theory}}

A core objective of a learner is to generalize from its experience.<ref name="bishop2006">{{citation|first= C. M. |last= Bishop |authorlink=Christopher M. Bishop |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer |isbn=978-0-387-31073-2}}</ref><ref>{{Cite Mehryar Afshin Ameet 2012}}</ref> Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

A core objective of a learner is to generalize from its experience. Generalization in this context is the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases.

学习者的一个核心目标是从经验中总结。在这种情况下，泛化是指学习机器在经历了一个学习数据集之后，能够准确地执行新的、看不见的例子 / 任务的能力。这些训练的例子来自于一些通常不为人知的概率分布(被认为是事件空间的代表) ，学习者必须建立一个关于这个空间的通用模型，使其能够在新案例中产生足够准确的预测。

The computational analysis of machine learning algorithms and their performance is a branch of [[theoretical computer science]] known as [[computational learning theory]]. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common. The [[bias–variance decomposition]] is one way to quantify generalization [[Errors and residuals|error]].

The computational analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance are quite common. The bias–variance decomposition is one way to quantify generalization error.

机器学习算法及其性能的计算分析是理论计算机科学的一个分支，被称为机器学习理论。由于训练集是有限的，未来是不确定的，学习理论通常不能保证算法的性能。相反，性能的概率界限是相当常见的。偏差-方差分解是量化泛化误差的一种方法。

For the best performance in the context of generalization, the complexity of the hypothesis should match the complexity of the function underlying the data. If the hypothesis is less complex than the function, then the model has under fitted the data. If the complexity of the model is increased in response, then the training error decreases. But if the hypothesis is too complex, then the model is subject to [[overfitting]] and generalization will be poorer.<ref name="alpaydin">{{Cite book |author=Alpaydin, Ethem |title=Introduction to Machine Learning |url=https://archive.org/details/introductiontoma00alpa_0 |year=2010 |publisher=The MIT Press |place=London |isbn=978-0-262-01243-0 |access-date=4 February 2017 |url-access=registration }}</ref>

For the best performance in the context of generalization, the complexity of the hypothesis should match the complexity of the function underlying the data. If the hypothesis is less complex than the function, then the model has under fitted the data. If the complexity of the model is increased in response, then the training error decreases. But if the hypothesis is too complex, then the model is subject to overfitting and generalization will be poorer.

为了在概括的背景下获得最佳性能，假设的复杂性应该与数据所依赖的功能的复杂性相匹配。如果假设没有函数那么复杂，那么模型就不能很好地拟合数据。如果在响应时增加模型的复杂度，则训练误差减小。但如果假设过于复杂，则模型容易过拟合，泛化能力较差。

In addition to performance bounds, learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in [[Time complexity#Polynomial time|polynomial time]]. There are two kinds of [[time complexity]] results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.

In addition to performance bounds, learning theorists study the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results. Positive results show that a certain class of functions can be learned in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.

除了性能界限，学习理论家研究学习的时间复杂性和可行性。在机器学习理论，一个计算被认为是可行的，如果它可以在多项式时间内完成。有两种时间复杂度的结果。实证结果表明，在多项式时间内可以学习到一类函数。否定的结果表明，某些类不能在多项式时间内学习。

== Approaches ==

=== Types of learning algorithms ===

The types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve.

The types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve.

机器学习算法的类型在它们的方法、它们输入和输出的数据的类型以及它们要解决的任务或问题的类型上都有所不同。

==== Supervised learning ====

{{Main|Supervised learning}}

[[File:Svm max sep hyperplane with margin.png|thumb|A [[support vector machine]] is a supervised learning model that divides the data into regions separated by a [[linear classifier|linear boundary]]. Here, the linear boundary divides the black circles from the white.]]

A [[support vector machine is a supervised learning model that divides the data into regions separated by a linear boundary. Here, the linear boundary divides the black circles from the white.]]

支持向量机是一个监督式学习模型，它将数据划分为由线性边界分隔的区域。这里，线性边界将黑色圆圈和白色圆圈分开。]

Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs.<ref>{{cite book |last1=Russell |first1=Stuart J. |last2=Norvig |first2=Peter |title=Artificial Intelligence: A Modern Approach |date=2010 |publisher=Prentice Hall |isbn=9780136042594 |edition=Third|title-link=Artificial Intelligence: A Modern Approach }}</ref> The data is known as [[training data]], and consists of a set of training examples. Each training example has one or more inputs and the desired output, also known as a supervisory signal. In the mathematical model, each training example is represented by an [[array data structure|array]] or vector, sometimes called a feature vector, and the training data is represented by a [[Matrix (mathematics)|matrix]]. Through iterative optimization of an [[Loss function|objective function]], supervised learning algorithms learn a function that can be used to predict the output associated with new inputs.<ref>{{cite book |last1=Mohri |first1=Mehryar |last2=Rostamizadeh |first2=Afshin |last3=Talwalkar |first3=Ameet |title=Foundations of Machine Learning |date=2012 |publisher=The MIT Press |isbn=9780262018258}}</ref> An optimal function will allow the algorithm to correctly determine the output for inputs that were not a part of the training data. An algorithm that improves the accuracy of its outputs or predictions over time is said to have learned to perform that task.<ref name="Mitchell-1997" />

Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs. The data is known as training data, and consists of a set of training examples. Each training example has one or more inputs and the desired output, also known as a supervisory signal. In the mathematical model, each training example is represented by an array or vector, sometimes called a feature vector, and the training data is represented by a matrix. Through iterative optimization of an objective function, supervised learning algorithms learn a function that can be used to predict the output associated with new inputs. An optimal function will allow the algorithm to correctly determine the output for inputs that were not a part of the training data. An algorithm that improves the accuracy of its outputs or predictions over time is said to have learned to perform that task.

监督式学习算法建立一个包含输入和期望输出的数据集的数学模型。这些数据被称为训练数据，由一组训练示例组成。每个训练例子都有一个或多个输入和期望的输出，也称为监督信号。在数学模型中，每个训练样本由一个数组或向量表示，有时也称为特征向量，训练数据由一个矩阵表示。通过对目标函数的迭代优化，监督式学习算法学习一个可以用来预测与新输入相关的输出的函数。一个最优的函数将允许算法正确地确定输出的输入，而不是训练数据的一部分。一个随着时间推移提高输出或预测准确性的算法据说已经学会了执行这个任务。

Types of supervised learning algorithms include [[Active learning (machine learning)|Active learning]] , [[Statistical classification|classification]] and [[Regression analysis|regression]].<ref>{{cite book|last=Alpaydin|first=Ethem|title=Introduction to Machine Learning|date=2010|publisher=MIT Press|isbn=978-0-262-01243-0|page=9|url=https://books.google.com/books?id=7f5bBAAAQBAJ&printsec=frontcover#v=onepage&q=classification&f=false}}</ref> Classification algorithms are used when the outputs are restricted to a limited set of values, and regression algorithms are used when the outputs may have any numerical value within a range. As an example, for a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email.

Types of supervised learning algorithms include Active learning , classification and regression. Classification algorithms are used when the outputs are restricted to a limited set of values, and regression algorithms are used when the outputs may have any numerical value within a range. As an example, for a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email.

监督式学习算法的类型包括主动学习、分类和回归。当输出被限制在一个有限的值集内时使用分类算法，当输出在一个范围内可能有任何数值时使用回归算法。例如，对于过滤电子邮件的分类算法，输入将是一封收到的电子邮件，输出将是用于将电子邮件归档的文件夹的名称。

[[Similarity learning]] is an area of supervised machine learning closely related to regression and classification, but the goal is to learn from examples using a similarity function that measures how similar or related two objects are. It has applications in [[ranking]], [[recommendation systems]], visual identity tracking, face verification, and speaker verification.

Similarity learning is an area of supervised machine learning closely related to regression and classification, but the goal is to learn from examples using a similarity function that measures how similar or related two objects are. It has applications in ranking, recommendation systems, visual identity tracking, face verification, and speaker verification.

相似性学习是监督式学习领域中与回归和分类密切相关的一个领域，但目标是通过使用相似性函数来衡量两个对象之间的相似程度，从实例中学习。它在排名、推荐系统、视觉身份跟踪、人脸验证和说话人确认等方面都有应用。

==== Unsupervised learning ====

{{Main|Unsupervised learning}}{{See also|Cluster analysis}}

Unsupervised learning algorithms take a set of data that contains only inputs, and find structure in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning algorithms identify commonalities in the data and react based on the presence or absence of such commonalities in each new piece of data. A central application of unsupervised learning is in the field of [[density estimation]] in [[statistics]], such as finding the [[probability density function]].<ref name="JordanBishop2004">{{cite book |first1=Michael I. |last1=Jordan |first2=Christopher M. |last2=Bishop |chapter=Neural Networks |editor=Allen B. Tucker |title=Computer Science Handbook, Second Edition (Section VII: Intelligent Systems) |location=Boca Raton, Florida |publisher=Chapman & Hall/CRC Press LLC |year=2004 |isbn=978-1-58488-360-9 }}</ref> Though unsupervised learning encompasses other domains involving summarizing and explaining data features.

Unsupervised learning algorithms take a set of data that contains only inputs, and find structure in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning algorithms identify commonalities in the data and react based on the presence or absence of such commonalities in each new piece of data. A central application of unsupervised learning is in the field of density estimation in statistics, such as finding the probability density function. Though unsupervised learning encompasses other domains involving summarizing and explaining data features.

非监督式学习算法只需要一组只包含输入的数据，然后在数据中找到结构，比如对数据点进行分组或聚类。因此，算法从未被标记、分类或分类的测试数据中学习。而不是响应反馈，非监督式学习算法识别数据中的共性，并根据每个新数据中是否存在这些共性而做出反应。非监督式学习的一个核心应用是统计学中的密度估计领域，比如寻找概率密度函数。尽管非监督式学习包含了其他领域，包括总结和解释数据特性。

Cluster analysis is the assignment of a set of observations into subsets (called ''clusters'') so that observations within the same cluster are similar according to one or more predesignated criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some ''similarity metric'' and evaluated, for example, by ''internal compactness'', or the similarity between members of the same cluster, and ''separation'', the difference between clusters. Other methods are based on ''estimated density'' and ''graph connectivity''.

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to one or more predesignated criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated, for example, by internal compactness, or the similarity between members of the same cluster, and separation, the difference between clusters. Other methods are based on estimated density and graph connectivity.

数据聚类是将一组观测值分配到一个子集(称为集群)中，这样同一个集群中的观测值就可以根据一个或多个预先指定的标准相似，而从不同的集群中提取的观测值就不一样了。不同的聚类技术对数据的结构做出不同的假设，通常用一些相似度量来定义和评估，例如，通过内部紧凑性，或同一集群成员之间的相似性，以及分离，集群之间的差异。其他方法是基于密度估计和图连通性。

==== Semi-supervised learning ====

{{Main|Semi-supervised learning}}

Semi-supervised learning falls between [[unsupervised learning]] (without any labeled training data) and [[supervised learning]] (with completely labeled training data). Some of the training examples are missing training labels, yet many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce a considerable improvement in learning accuracy.

Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Some of the training examples are missing training labels, yet many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce a considerable improvement in learning accuracy.

半监督学习位于非监督式学习(没有任何标记的训练数据)和监督式学习(完全标记的训练数据)之间。有些训练样本缺少训练标签，但许多机器学习研究人员发现，如果将未标记的数据与少量标记的数据结合使用，可以大大提高学习的准确性。

In [[Weak supervision|weakly supervised learning]], the training labels are noisy, limited, or imprecise; however, these labels are often cheaper to obtain, resulting in larger effective training sets.<ref>{{Cite web|url=https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html|title=Weak Supervision: The New Programming Paradigm for Machine Learning|author1=Alex Ratner |author2=Stephen Bach |author3=Paroma Varma |author4=Chris |others= referencing work by many other members of Hazy Research|website=hazyresearch.github.io|access-date=2019-06-06}}</ref>

In weakly supervised learning, the training labels are noisy, limited, or imprecise; however, these labels are often cheaper to obtain, resulting in larger effective training sets.

在弱监督式学习中，训练标签是噪声的、有限的或不精确的; 然而，这些标签往往更便宜，导致更大的有效训练集。

==== Reinforcement learning ====

{{Main|Reinforcement learning}}

Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.

Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.

强化学习学习是机器学习的一个领域，它研究软件代理应该如何在一个环境中采取行动，以便最大化某种累积回报的概念。由于其普遍性，该领域的研究在许多其他学科，如博弈论，控制理论，运筹学，信息论，基于仿真的优化，多智能体系统，群体智能，统计学和遗传算法。在机器学习中，环境通常被表示为马可夫决策过程。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型，而是在精确模型不可行的情况下使用。强化学习算法用于自动驾驶车辆或学习与人类对手玩游戏。

==== Self learning ====

Self-learning as machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning named Crossbar Adaptive Array (CAA). <ref> Bozinovski, S. (1982). "A self-learning system using secondary reinforcement" . In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North Holland. pp. 397–402. {{ISBN|978-0-444-86488-8}}.</ref> It is a learning with no external rewards and no external teacher advices. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion. <ref>Bozinovski, Stevo (2014) "Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981." Procedia Computer Science p. 255-263 </ref>

Self-learning as machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning named Crossbar Adaptive Array (CAA). It is a learning with no external rewards and no external teacher advices. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion.

自学习作为机器学习的范式在1982年随着一个神经网络的自学习能力被命名为交叉自适应阵列(CAA)。这是一种没有外部奖励和教师建议的学习。Caa 自学习算法以交叉方式计算关于行为和情绪(感觉)对结果情况的决定。这个系统是由认知和情感的相互作用所驱动的。

The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine:

The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine:

自学习算法更新内存矩阵 w | | | w (a，s) | | ，以便在每次迭代中执行以下机器学习例程:

In situation s perform action a;

In situation s perform action a;

在情境中执行动作 a;

Receive consequence situation s’;

Receive consequence situation s’;

接受后果情况 s’ ;

Compute emotion of being in consequence situation v(s’);

Compute emotion of being in consequence situation v(s’);

计算处于结果情境 v (s’)中的情绪;

Update crossbar memory w’(a,s) = w(a,s) + v(s’).

Update crossbar memory w’(a,s) = w(a,s) + v(s’).

更新交叉条内存 w’(a，s) w (a，s) + v (s’)。

It is a system with only one input, situation s, and only one output, action (or behavior) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioral environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal seeking behavior, in an environment that contains both desirable and undesirable situations. <ref> Bozinovski, S. (2001) "Self-learning agents: A connectionist theory of emotion based on crossbar value judgment." Cybernetics and Systems 32(6) 637-667. </ref>

It is a system with only one input, situation s, and only one output, action (or behavior) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioral environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal seeking behavior, in an environment that contains both desirable and undesirable situations.

它是一个只有一个输入、情景和一个输出、动作(或行为)的系统。既没有单独的强化输入，也没有来自环境的通知输入。反向传播价值(二次强化)是对结果情境的情感。Caa 存在于两个环境中，一个是行为环境，另一个是遗传环境，从它开始，只有一次接受初始情绪的情况下将遇到的行为环境。在从遗传环境中获得基因组(物种)载体后，CAA 在一个既包含理想情况又包含不理想情况的环境中学习一种寻求目标的行为。

==== Feature learning ====

{{Main|Feature learning}}

Several learning algorithms aim at discovering better representations of the inputs provided during training.<ref name="pami">{{cite journal |author1=Y. Bengio |author2=A. Courville |author3=P. Vincent |title=Representation Learning: A Review and New Perspectives |journal= IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|doi=10.1109/tpami.2013.50 |pmid=23787338 |volume=35 |issue=8 |pages=1798–1828|arxiv=1206.5538 }}</ref> Classic examples include [[principal components analysis]] and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual [[feature engineering]], and allows a machine to both learn the features and use them to perform a specific task.

Several learning algorithms aim at discovering better representations of the inputs provided during training. Classic examples include principal components analysis and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual feature engineering, and allows a machine to both learn the features and use them to perform a specific task.

一些学习算法旨在发现更好的表示期间提供的培训输入。典型的例子包括主成分分析和数据聚类分析。特征学习算法，也称为表征学习算法，通常试图保留输入中的信息，但也以使其有用的方式对其进行转换，通常作为执行分类或预测之前的预处理步骤。这种技术可以重建来自未知数据生成分布的输入，但不一定忠实于在这种分布下不可信的配置。这取代了手工特性工程，并且允许机器学习特性并使用它们来执行特定的任务。

Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labeled input data. Examples include [[artificial neural network]]s, [[multilayer perceptron]]s, and supervised [[dictionary learning]]. In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, [[independent component analysis]], [[autoencoder]]s, [[matrix decomposition|matrix factorization]]<ref>{{cite conference |author1=Nathan Srebro |author2=Jason D. M. Rennie |author3=Tommi S. Jaakkola |title=Maximum-Margin Matrix Factorization |conference=[[Conference on Neural Information Processing Systems|NIPS]] |year=2004}}</ref> and various forms of [[Cluster analysis|clustering]].<ref name="coates2011">{{cite conference

Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labeled input data. Examples include artificial neural networks, multilayer perceptrons, and supervised dictionary learning. In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, independent component analysis, autoencoders, matrix factorization and various forms of clustering.<ref name="coates2011">{{cite conference

特征学习可以是有监督的，也可以是无监督的。在有监督的特征学习中，利用标记输入数据学习特征。例如人工神经网络、多层感知器和有监督的字典学习。在无监督的特征学习中，特征是通过未标记的输入数据学习的。例子包括字典学习、独立元素分析、自动编码器、矩阵分解和各种形式的聚类

|last1 = Coates

|last1 = Coates

1 Coates

|first1 = Adam

|first1 = Adam

首先，亚当

|last2 = Lee

|last2 = Lee

最后2名 Lee

|first2 = Honglak

|first2 = Honglak

| first2 Honglak

|last3 = Ng

|last3 = Ng

| 最后3 Ng

|first3 = Andrew Y.

|first3 = Andrew Y.

第三名: 安德鲁 · y。

|title = An analysis of single-layer networks in unsupervised feature learning

|title = An analysis of single-layer networks in unsupervised feature learning

无监督特征学习中的单层网络分析

|conference = Int'l Conf. on AI and Statistics (AISTATS)

|conference = Int'l Conf. on AI and Statistics (AISTATS)

国际会议。有关人工智能及统计的资料

|year = 2011

|year = 2011

2011年

|url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

|url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

Http://machinelearning.wustl.edu/mlpapers/paper_files/aistats2011_coatesnl11.pdf

|access-date = 2018-11-25

|access-date = 2018-11-25

2018-11-25

|archive-url = https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

|archive-url = https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

| 档案-网址 https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/aistats2011_coatesnl11.pdf

|archive-date = 2017-08-13

|archive-date = 2017-08-13

| 档案-日期2017-08-13

|url-status = dead

|url-status = dead

状态死机

}}</ref><ref>{{cite conference |last1 = Csurka |first1 = Gabriella|last2 = Dance |first2 = Christopher C.|last3 = Fan |first3 = Lixin|last4 = Willamowski |first4 = Jutta|last5 = Bray |first5 = Cédric|title = Visual categorization with bags of keypoints|conference = ECCV Workshop on Statistical Learning in Computer Vision|year = 2004|url = https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/csurka-eccv-04.pdf}}</ref><ref name="jurafsky">{{cite book |title=Speech and Language Processing |author1=Daniel Jurafsky |author2=James H. Martin |publisher=Pearson Education International |year=2009 |pages=145–146}}</ref>

}}</ref>

{} / ref

[[Manifold learning]] algorithms attempt to do so under the constraint that the learned representation is low-dimensional. [[Sparse coding]] algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. [[Multilinear subspace learning]] algorithms aim to learn low-dimensional representations directly from [[tensor]] representations for multidimensional data, without reshaping them into higher-dimensional vectors.<ref>{{cite journal |first1=Haiping |last1=Lu |first2=K.N. |last2=Plataniotis |first3=A.N. |last3=Venetsanopoulos |url=http://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf |title=A Survey of Multilinear Subspace Learning for Tensor Data |journal=Pattern Recognition |volume=44 |number=7 |pages=1540–1551 |year=2011 |doi=10.1016/j.patcog.2011.01.004}}</ref> [[Deep learning]] algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.<ref>{{cite book | title = Learning Deep Architectures for AI | author = Yoshua Bengio | publisher = Now Publishers Inc. | year = 2009 | isbn = 978-1-60198-294-0 | pages = 1–3 | url = https://books.google.com/books?id=cq5ewg7FniMC&pg=PA3| author-link = Yoshua Bengio }}</ref>

Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into higher-dimensional vectors. Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.

流形学习算法试图在学习表示为低维的约束条件下进行流形学习。稀疏编码算法试图在学习表示为稀疏的约束条件下进行编码，这意味着数学模型有许多零点。多线性子空间学习算法旨在直接从多维数据的张量表示中学习低维表示，而不是将它们重塑为高维向量。深度学习算法发现了多层次的表示，或者一个特征层次结构，具有更高层次、更抽象的特征，这些特征定义为(或生成)低层次的特征。有人认为，一个智能机器是一个学习的表现，解散的潜在因素的变化，解释了观察到的数据。

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.

特征学习的动力来自于机器学习任务，如分类，通常需要数学上和计算上方便处理的输入。然而，真实世界的数据，如图像、视频和感官数据，并没有屈服于通过算法定义特定特征的尝试。另一种方法是通过检查发现这些特征或表示，而不依赖于显式算法。

==== Sparse dictionary learning ====

{{Main|Sparse dictionary learning}}

Sparse dictionary learning is a feature learning method where a training example is represented as a linear combination of [[basis function]]s, and is assumed to be a [[sparse matrix]]. The method is [[strongly NP-hard]] and difficult to solve approximately.<ref>{{cite journal |first=A. M. |last=Tillmann |title=On the Computational Intractability of Exact and Approximate Dictionary Learning |journal=IEEE Signal Processing Letters |volume=22 |issue=1 |year=2015 |pages=45–49 |doi=10.1109/LSP.2014.2345761|bibcode=2015ISPL...22...45T |arxiv=1405.6664 }}</ref> A popular [[heuristic]] method for sparse dictionary learning is the [[K-SVD]] algorithm. Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine the class to which a previously unseen training example belongs. For a dictionary where each class has already been built, a new training example is associated with the class that is best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in [[image de-noising]]. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.<ref>Aharon, M, M Elad, and A Bruckstein. 2006. "[http://sites.fas.harvard.edu/~cs278/papers/ksvd.pdf K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation]." Signal Processing, IEEE Transactions on 54 (11): 4311–4322</ref>

Sparse dictionary learning is a feature learning method where a training example is represented as a linear combination of basis functions, and is assumed to be a sparse matrix. The method is strongly NP-hard and difficult to solve approximately. A popular heuristic method for sparse dictionary learning is the K-SVD algorithm. Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine the class to which a previously unseen training example belongs. For a dictionary where each class has already been built, a new training example is associated with the class that is best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in image de-noising. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.

稀疏词典学习是一种特征学习方法，在这种方法中，一个训练例子被表示为基函数的线性组合，并假设为稀疏矩阵。该方法具有强 np- 困难性，近似求解困难。一种流行的启发式稀疏字典学习方法是 K-SVD 算法。稀疏词典学习已经应用于几种情况下。在分类中，问题在于确定先前未见的训练示例所属的类。对于已经构建了每个类的字典，一个新的训练示例将与相应的字典最好地稀疏表示的类相关联。稀疏字典学习也被应用到图像去噪中。其关键思想是，一个干净的图像补丁可以由图像字典稀疏表示，但噪声不能。

==== Anomaly detection ====

{{Main|Anomaly detection}}

In [[data mining]], anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.<ref name=":0">{{Citation|last=Zimek|first=Arthur|title=Outlier Detection|date=2017|encyclopedia=Encyclopedia of Database Systems|pages=1–5|publisher=Springer New York|language=en|doi=10.1007/978-1-4899-7993-3_80719-1|isbn=9781489979933|last2=Schubert|first2=Erich}}</ref> Typically, the anomalous items represent an issue such as [[bank fraud]], a structural defect, medical problems or errors in a text. Anomalies are referred to as [[outlier]]s, novelties, noise, deviations and exceptions.<ref>{{cite journal | last1 = Hodge | first1 = V. J. | last2 = Austin | first2 = J. | doi = 10.1007/s10462-004-4304-y | title = A Survey of Outlier Detection Methodologies | journal = Artificial Intelligence Review| volume = 22 | issue = 2 | pages = 85–126 | year = 2004 | url = http://eprints.whiterose.ac.uk/767/1/hodgevj4.pdf| pmid = | pmc = | citeseerx = 10.1.1.318.4023 }}</ref>

In data mining, anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically, the anomalous items represent an issue such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are referred to as outliers, novelties, noise, deviations and exceptions.

在数据挖掘中，异常检测，也被称为异常检测，是识别那些引起怀疑的稀有项目，事件或者观察结果，它们与大多数数据有很大的不同。一般来说，这些不正常的项目代表一个问题，如银行欺诈、结构缺陷、医疗问题或文本中的错误。异常被称为异常值、新奇值、噪音、偏差和异常。

In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular, unsupervised algorithms) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.<ref>{{cite journal| first=Paul | last=Dokas | first2=Levent |last2=Ertoz |first3=Vipin |last3=Kumar |first4=Aleksandar |last4=Lazarevic |first5=Jaideep |last5=Srivastava |first6=Pang-Ning |last6=Tan | title=Data mining for network intrusion detection | year=2002 | journal=Proceedings NSF Workshop on Next Generation Data Mining | url=http://www.csee.umbc.edu/~kolari1/Mining/ngdm/dokas.pdf}}</ref>

In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular, unsupervised algorithms) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.

特别是在滥用和网络入侵检测的背景下，感兴趣的对象往往不是罕见的对象，而是突发性的活动。这种模式并不符合异常值作为稀有对象的通用统计定义，而且许多异常检测方法(特别是无监督算法)将无法处理这类数据，除非它已经被适当地聚合。相反，数据聚类算法可以检测到这些模式形成的微团簇。

Three broad categories of anomaly detection techniques exist.<ref name="ChandolaSurvey">{{cite journal |last1=Chandola |first1=V. |last2=Banerjee |first2=A. |last3=Kumar |first3=V. |year=2009 |title=Anomaly detection: A survey|journal=[[ACM Computing Surveys]]|volume=41|issue=3|pages=1–58|doi=10.1145/1541880.1541882|url=https://www.semanticscholar.org/paper/71d1ac92ad36b62a04f32ed75a10ad3259a7218d }}</ref> Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set and then test the likelihood of a test instance to be generated by the model.

Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set and then test the likelihood of a test instance to be generated by the model.

异常检测技术有3大类。无监督的异常检测 / 测试技术在假设数据集中的大多数实例都是正常的情况下，通过寻找似乎最不适合数据集的其余部分的实例，检测未标记的测试数据集中的异常。监督异常检测分析技术需要一个被标记为“正常”和“异常”的数据集，还需要训练一个分类器(许多其他分类分析问题的关键区别在于异常检测本身的不平衡性)。半监督异常检测技术从给定的正常训练数据集构建一个表示正常行为的模型，然后测试由该模型生成的测试实例的可能性。

====Robot learning====

In [[developmental robotics]], [[robot learning]] algorithms generate their own sequences of learning experiences, also known as a curriculum, to cumulatively acquire new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturation, [[Motor_coordination#Muscle_synergies|motor synergies]] and imitation.

In developmental robotics, robot learning algorithms generate their own sequences of learning experiences, also known as a curriculum, to cumulatively acquire new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturation, motor synergies and imitation.

在发展型机器人学习中，机器人学习算法产生自己的学习经验序列，也称为课程，通过自我引导的探索和与人类的社会互动，累积获得新技能。这些机器人使用诸如主动学习、成熟、协同运动和模仿等引导机制。

==== Association rules ====

{{Main|Association rule learning}}{{See also|Inductive logic programming}}

Association rule learning is a [[rule-based machine learning]] method for discovering relationships between variables in large databases. It is intended to identify strong rules discovered in databases using some measure of "interestingness".<ref name="piatetsky">Piatetsky-Shapiro, Gregory (1991), ''Discovery, analysis, and presentation of strong rules'', in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., ''Knowledge Discovery in Databases'', AAAI/MIT Press, Cambridge, MA.</ref>

Association rule learning is a rule-based machine learning method for discovering relationships between variables in large databases. It is intended to identify strong rules discovered in databases using some measure of "interestingness".

关联规则学习是一种基于规则的机器学习方法，用于发现大型数据库中变量之间的关系。它旨在利用某种“有趣度”的度量，识别在数据库中发现的强大规则。

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining characteristic of a rule-based machine learning algorithm is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learning algorithms that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.<ref>{{Cite journal|last=Bassel|first=George W.|last2=Glaab|first2=Enrico|last3=Marquez|first3=Julietta|last4=Holdsworth|first4=Michael J.|last5=Bacardit|first5=Jaume|date=2011-09-01|title=Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets|journal=The Plant Cell|language=en|volume=23|issue=9|pages=3101–3116|doi=10.1105/tpc.111.088153|issn=1532-298X|pmc=3203449|pmid=21896882}}</ref> Rule-based machine learning approaches include [[learning classifier system]]s, association rule learning, and [[artificial immune system]]s.

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining characteristic of a rule-based machine learning algorithm is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learning algorithms that commonly identify a singular model that can be universally applied to any instance in order to make a prediction. Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems.

基于规则的机器学习是任何机器学习方法的通用术语，这些机器学习方法识别、学习或发展“规则”来存储、操作或应用知识。基于规则的机器学习算法的定义特征是识别和利用一组共同表示系统捕获的知识的关系规则。这与其他机器学习算法不同，后者通常识别一个单一的模型，这个模型可以普遍应用于任何实例，以便进行预测。基于规则的机器学习方法包括学习分类器系统、关联规则学习和人工免疫系统。

Based on the concept of strong rules, [[Rakesh Agrawal (computer scientist)|Rakesh Agrawal]], [[Tomasz Imieliński]] and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by [[point-of-sale]] (POS) systems in supermarkets.<ref name="mining">{{Cite book | last1 = Agrawal | first1 = R. | last2 = Imieliński | first2 = T. | last3 = Swami | first3 = A. | doi = 10.1145/170035.170072 | chapter = Mining association rules between sets of items in large databases | title = Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93 | pages = 207 | year = 1993 | isbn = 978-0897915922 | pmid = | pmc = | citeseerx = 10.1.1.40.6984 }}</ref> For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as promotional [[pricing]] or [[product placement]]s. In addition to [[market basket analysis]], association rules are employed today in application areas including [[Web usage mining]], [[intrusion detection]], [[continuous production]], and [[bioinformatics]]. In contrast with [[sequence mining]], association rule learning typically does not consider the order of items either within a transaction or across transactions.

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as promotional pricing or product placements. In addition to market basket analysis, association rules are employed today in application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

基于强规则的概念，Rakesh Agrawal、 Tomasz imieli ski 和 Arun Swami 引入了关联规则，用于在超市销售点(POS)系统记录的大规模交易数据中发现产品之间的规则。例如，在超市的销售数据中发现的规则数学，洋葱，土豆，右塔罗，数学表明，如果顾客一起购买洋葱和土豆，他们也可能购买汉堡肉。这些信息可以作为市场活动的决策依据，如促销价格或产品植入。除了市场篮子分析之外，关联规则还应用于 Web 使用挖掘、入侵检测、连续生产和生物信息学等应用领域。与序列挖掘相比，关联规则学习通常不考虑事务内或事务之间的项顺序。

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component, typically a [[genetic algorithm]], with a learning component, performing either [[supervised learning]], [[reinforcement learning]], or [[unsupervised learning]]. They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a [[piecewise]] manner in order to make predictions.<ref>{{Cite journal|last=Urbanowicz|first=Ryan J.|last2=Moore|first2=Jason H.|date=2009-09-22|title=Learning Classifier Systems: A Complete Introduction, Review, and Roadmap|journal=Journal of Artificial Evolution and Applications|language=en|volume=2009|pages=1–25|doi=10.1155/2009/736398|issn=1687-6229|doi-access=free}}</ref>

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component, typically a genetic algorithm, with a learning component, performing either supervised learning, reinforcement learning, or unsupervised learning. They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.

学习分类器系统(LCS)是一系列基于规则的机器学习算法，它结合了一个发现组件，通常是一个遗传算法，和一个学习组件，执行监督式学习、强化学习或非监督式学习。他们试图确定一组与上下文相关的规则，这些规则以分段的方式共同储存和应用知识，以便进行预测。

Inductive logic programming (ILP) is an approach to rule-learning using [[logic programming]] as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that [[Entailment|entails]] all positive and no negative examples. [[Inductive programming]] is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as [[Functional programming|functional programs]].

Inductive logic programming (ILP) is an approach to rule-learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as functional programs.

归纳逻辑规划(ILP)是一种用逻辑规划作为输入示例、背景知识和假设的统一表示的规则学习方法。如果将已知的背景知识编码，并将一组示例表示为事实的逻辑数据库，ILP 系统将推导出一个假设的逻辑程序，其中包含所有正面和负面的示例。归纳编程是一个相关的领域，它考虑用任何一种编程语言来表示假设(不仅仅是逻辑编程) ，比如函数编程。

Inductive logic programming is particularly useful in [[bioinformatics]] and [[natural language processing]]. [[Gordon Plotkin]] and [[Ehud Shapiro]] laid the initial theoretical foundation for inductive machine learning in a logical setting.<ref>Plotkin G.D. [https://www.era.lib.ed.ac.uk/bitstream/handle/1842/6656/Plotkin1972.pdf;sequence=1 Automatic Methods of Inductive Inference], PhD thesis, University of Edinburgh, 1970.</ref><ref>Shapiro, Ehud Y. [http://ftp.cs.yale.edu/publications/techreports/tr192.pdf Inductive inference of theories from facts], Research Report 192, Yale University, Department of Computer Science, 1981. Reprinted in J.-L. Lassez, G. Plotkin (Eds.), Computational Logic, The MIT Press, Cambridge, MA, 1991, pp. 199–254.</ref><ref>Shapiro, Ehud Y. (1983). ''Algorithmic program debugging''. Cambridge, Mass: MIT Press. {{ISBN|0-262-19218-7}}</ref> Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples.<ref>Shapiro, Ehud Y. "[http://dl.acm.org/citation.cfm?id=1623364 The model inference system]." Proceedings of the 7th international joint conference on Artificial intelligence-Volume 2. Morgan Kaufmann Publishers Inc., 1981.</ref> The term ''inductive'' here refers to [[Inductive reasoning|philosophical]] induction, suggesting a theory to explain observed facts, rather than [[mathematical induction|mathematical]] induction, proving a property for all members of a well-ordered set.

Inductive logic programming is particularly useful in bioinformatics and natural language processing. Gordon Plotkin and Ehud Shapiro laid the initial theoretical foundation for inductive machine learning in a logical setting. Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples. The term inductive here refers to philosophical induction, suggesting a theory to explain observed facts, rather than mathematical induction, proving a property for all members of a well-ordered set.

归纳逻辑程序设计在生物信息学和自然语言处理中特别有用。戈登 · 普洛特金和埃胡德 · 夏皮罗为归纳机器学习在逻辑上奠定了最初的理论基础。夏皮罗在1981年建立了他们的第一个实现(模型推理系统) : 一个从正反例中归纳推断逻辑程序的 Prolog 程序。归纳这个术语在这里指的是哲学归纳，建议一个理论来解释观察到的事实，而不是数学归纳法，证明了一个有序集合的所有成员的性质。

=== Models ===

Performing machine learning involves creating a [[Statistical model|model]], which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.

Performing machine learning involves creating a model, which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.

执行机器学习包括创建一个模型，该模型根据一些训练数据进行训练，然后可以处理额外的数据进行预测。机器学习系统已经使用和研究了各种类型的模型。

==== Artificial neural networks ====

{{Main|Artificial neural network}}{{See also|Deep learning}}

[[File:Colored neural network.svg|thumb|300px|An artificial neural network is an interconnected group of nodes, akin to the vast network of [[neuron]]s in a [[brain]]. Here, each circular node represents an [[artificial neuron]] and an arrow represents a connection from the output of one artificial neuron to the input of another.]]

An artificial neural network is an interconnected group of nodes, akin to the vast network of [[neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.]]

人工神经网络是一组相互连接的节点，类似于大脑中庞大的神经元网络。在这里，每个圆形节点代表一个人工神经元，一个箭头代表从一个人工神经元的输出到另一个输入的连接

Artificial neural networks (ANNs), or [[Connectionism|connectionist]] systems, are computing systems vaguely inspired by the [[biological neural network]]s that constitute animal [[brain]]s. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.

Artificial neural networks (ANNs), or connectionist systems, are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.

人工神经网络(ann) ，或连接主义系统，是计算机系统隐约受到构成动物大脑的生物神经网络的启发。这种系统通过考虑例子来“学习”执行任务，通常不用任何特定任务的规则编程。

An ANN is a model based on a collection of connected units or nodes called "[[artificial neuron]]s", which loosely model the [[neuron]]s in a biological [[brain]]. Each connection, like the [[synapse]]s in a biological [[brain]], can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a [[real number]], and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a [[weight (mathematics)|weight]] that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.

An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.

人工神经网络是一种基于一组被称为“人工神经元”的连接单元或节点的模型，人工神经元可以对生物大脑中的神经元进行松散的建模。每一个连接，就像生物大脑中的突触一样，可以将信息，一个“信号” ，从一个人工神经元传递到另一个。接收到信号的人工神经元可以处理它，然后发送信号给连接到它的其他人工神经元。在通常的人工神经网络实现中，人工神经元之间连接处的信号是一个实数，每个人工神经元的输出是由一些输入和的非线性函数计算出来的。人造神经元之间的连接称为“边缘”。人工神经元和边缘通常有一个权重，可以随着学习的进行而调整。重量增加或减少连接处信号的强度。人工神经元可能有一个阈值，这样只有当聚合信号超过这个阈值时才发送信号。通常，人造神经元聚集成层。不同的层可以对其输入执行不同类型的转换。信号从第一层(输入层)传输到最后一层(输出层) ，可能是在多次遍历这些层之后。

The original goal of the ANN approach was to solve problems in the same way that a [[human brain]] would. However, over time, attention moved to performing specific tasks, leading to deviations from [[biology]]. Artificial neural networks have been used on a variety of tasks, including [[computer vision]], [[speech recognition]], [[machine translation]], [[social network]] filtering, [[general game playing|playing board and video games]] and [[medical diagnosis]].

The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

人工神经网络方法的最初目标是用人类大脑解决问题的同样方式。然而，随着时间的推移，注意力转移到执行特定的任务上，导致了与生物学的偏差。人工神经网络已被用于各种任务，包括计算机视觉、语音识别、机器翻译、社会网络过滤、玩棋盘和视频游戏以及医疗诊断。

[[Deep learning]] consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are [[computer vision]] and [[speech recognition]].<ref>Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.802&rep=rep1&type=pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations]" Proceedings of the 26th Annual International Conference on Machine Learning, 2009.</ref>

Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.

深度学习由人工神经网络中的多个隐层组成。这种方法试图模拟人类大脑将光和声音处理成视觉和听觉的方式。深度学习的一些成功应用是计算机视觉和语音识别。

==== Decision trees ====

{{Main|Decision tree learning}}

Decision tree learning uses a [[decision tree]] as a [[Predictive modelling|predictive model]] to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, [[leaf node|leaves]] represent class labels and branches represent [[Logical conjunction|conjunction]]s of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically [[real numbers]]) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and [[decision making]]. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making.

Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making.

决策树学习使用一个决策树作为一个预测模型，从对一个项目的观察(在分支中表示)到对该项目的目标值的结论(在叶子中表示)。它是统计学、数据挖掘和机器学习中常用的预测建模方法之一。目标变量可以接受一组离散值的树模型称为分类树; 在这些树结构中，叶子代表类标签，分支代表连接到这些类标签的特征。目标变量可以取连续值(通常是实数)的决策树称为回归树。在决策分析中，可以使用决策树直观地表示决策和决策。在数据挖掘中，决策树描述数据，但得到的分类树可以作为决策的输入。

==== Support vector machines ====

{{Main|Support vector machines}}

Support vector machines (SVMs), also known as support vector networks, are a set of related [[supervised learning]] methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.<ref name="CorinnaCortes">{{Cite journal |last1=Cortes |first1=Corinna |authorlink1=Corinna Cortes |last2=Vapnik |first2=Vladimir N. |year=1995 |title=Support-vector networks |journal=[[Machine Learning (journal)|Machine Learning]] |volume=20 |issue=3 |pages=273–297 |doi=10.1007/BF00994018 |doi-access=free }}</ref> An SVM training algorithm is a non-[[probabilistic classification|probabilistic]], [[binary classifier|binary]], [[linear classifier]], although methods such as [[Platt scaling]] exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the [[kernel trick]], implicitly mapping their inputs into high-dimensional feature spaces.

Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

支持向量机，也称为支持向量网络，是一系列用于分类和回归的相关监督式学习 / 方法。给定一组训练样本，每个样本标记为属于两个类别中的一个，SVM 训练算法建立一个模型来预测一个新样本是属于一个类别还是另一个类别。支持向量机的训练算法是一种非概率的二进制线性分类器，尽管在概率分类环境中也存在使用支持向量机的方法，如 Platt 缩放法。除了执行线性分类，支持向量机可以有效地执行非线性分类使用所谓的核技巧，隐式映射到高维特征空间的输入。

[[Image:Linear regression.svg|thumb|upright=1.3|Illustration of linear regression on a data set.]]

Illustration of linear regression on a data set.

数据集上的线性回归。

==== Regression analysis ====

{{Main|Regression analysis}}

Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is [[linear regression]], where a single line is drawn to best fit the given data according to a mathematical criterion such as [[ordinary least squares]]. The latter is often extended by [[regularization (mathematics)]] methods to mitigate overfitting and bias, as in [[ridge regression]]. When dealing with non-linear problems, go-to models include [[polynomial regression]] (for example, used for trendline fitting in Microsoft Excel <ref>{{cite web|last1=Stevenson|first1=Christopher|title=Tutorial: Polynomial Regression in Excel|url=https://facultystaff.richmond.edu/~cstevens/301/Excel4.html|website=facultystaff.richmond.edu|accessdate=22 January 2017}}</ref>), [[Logistic regression]] (often used in [[statistical classification]]) or even [[kernel regression]], which introduces non-linearity by taking advantage of the [[kernel trick]] to implicitly map input variables to higher dimensional space.

Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel ), Logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher dimensional space.

回归分析包含了大量的统计方法来估计输入变量和它们的相关特征之间的关系。它最常见的形式是线性回归，根据一个数学标准，比如一般最小平方法，画一条线来最好地拟合给定的数据。后者通常通过正则化(数学)方法来扩展，以减少过拟合和偏差，如岭回归。在处理非线性问题时，常用的模型包括多项式回归(例如，在 Microsoft Excel 中用于趋势线拟合)、 Logit模型回归(通常用于分类)甚至核回归，它利用核技巧将输入变量隐式地映射到更高维度空间，从而引入了非线性。

==== Bayesian networks ====

{{Main|Bayesian network}}

[[Image:SimpleBayesNetNodes.svg|thumb|right|A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.]]

A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.

一个简单的贝氏网路。雨水会影响喷头是否被激活，雨水和喷头都会影响草地是否湿润。

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic [[graphical model]] that represents a set of [[random variables]] and their [[conditional independence]] with a [[directed acyclic graph]] (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform [[inference]] and learning. Bayesian networks that model sequences of variables, like [[speech recognition|speech signals]] or [[peptide sequence|protein sequences]], are called [[dynamic Bayesian network]]s. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called [[influence diagram]]s.

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

一个贝氏网路、信念网络或有向无环图模型是一个概率图模型，代表一组随机变量及其条件独立与有向无环图。例如，贝氏网路可以表示疾病和症状之间的概率关系。在给定症状的情况下，该网络可用于计算各种疾病出现的概率。现有的高效算法可以执行推理和学习。贝叶斯网络模型的变量序列，如语音信号或蛋白质序列，被称为动态贝叶斯网络。贝叶斯网络能够表示和解决不确定性决策问题的推广称为影响图。

==== Genetic algorithms ====

{{Main|Genetic algorithm}}

A genetic algorithm (GA) is a [[search algorithm]] and [[heuristic (computer science)|heuristic]] technique that mimics the process of [[natural selection]], using methods such as [[Mutation (genetic algorithm)|mutation]] and [[Crossover (genetic algorithm)|crossover]] to generate new [[Chromosome (genetic algorithm)|genotype]]s in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.<ref>{{cite journal |last1=Goldberg |first1=David E. |first2=John H. |last2=Holland |title=Genetic algorithms and machine learning |journal=[[Machine Learning (journal)|Machine Learning]] |volume=3 |issue=2 |year=1988 |pages=95–99 |doi=10.1007/bf00113892|url=https://deepblue.lib.umich.edu/bitstream/2027.42/46947/1/10994_2005_Article_422926.pdf }}</ref><ref>{{Cite journal |title=Machine Learning, Neural and Statistical Classification |journal=Ellis Horwood Series in Artificial Intelligence |first1=D. |last1=Michie |first2=D. J. |last2=Spiegelhalter |first3=C. C. |last3=Taylor |year=1994 |bibcode=1994mlns.book.....M }}</ref> Conversely, machine learning techniques have been used to improve the performance of genetic and [[evolutionary algorithm]]s.<ref>{{cite journal |last1=Zhang |first1=Jun |last2=Zhan |first2=Zhi-hui |last3=Lin |first3=Ying |last4=Chen |first4=Ni |last5=Gong |first5=Yue-jiao |last6=Zhong |first6=Jing-hui |last7=Chung |first7=Henry S.H. |last8=Li |first8=Yun |last9=Shi |first9=Yu-hui |title=Evolutionary Computation Meets Machine Learning: A Survey |journal=Computational Intelligence Magazine |year=2011 |volume=6 |issue=4 |pages=68–75 |doi=10.1109/mci.2011.942584}}</ref>

A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.

遗传算法(GA)是一种模仿自然选择过程的搜索算法和启发式技术，利用变异和交叉等方法产生新的基因型，以期为给定的问题找到好的解。在机器学习中，遗传算法在20世纪80年代和90年代被使用。相反，机器学习技术已被用来改善遗传和进化算法的性能。

=== Training models ===

Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. [[Overfitting]] is something to watch out for when training a machine learning model.

Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model.

通常情况下，机器学习模型需要大量的数据才能有良好的性能。通常，当训练一个机器学习模型时，需要从一个训练集中收集大量有代表性的数据样本。来自训练集的数据可以像文本语料库、图像集合和从服务的单个用户收集的数据一样多种多样。当训练一个机器学习模型时，过拟合是需要注意的事情。

==== Federated learning ====

{{Main|Federated learning}}

Federated learning is a new approach to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, [[Gboard]] uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to [[Google]].<ref>{{Cite web|url=http://ai.googleblog.com/2017/04/federated-learning-collaborative.html|title=Federated Learning: Collaborative Machine Learning without Centralized Training Data|website=Google AI Blog|language=en|access-date=2019-06-08}}</ref>

Federated learning is a new approach to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.

联邦学习是一种新的培训机器学习模型的方法，它分散了培训过程，允许用户通过不需要将他们的数据发送到一个集中的服务器来维护他们的隐私。通过将培训过程分散到许多设备上，这也提高了效率。例如，谷歌董事会使用联邦机器学习来训练用户手机上的搜索查询预测模型，而不必将个人搜索发送回谷歌。

== Applications ==

There are many applications for machine learning, including:

There are many applications for machine learning, including:

机器学习有许多应用，包括:

{{div col|colwidth=15em}}

* [[Precision agriculture|Agriculture]]

* [[Computational anatomy|Anatomy]]

* [[Adaptive website]]s

* [[Affective computing]]

* [[Banking]]

* [[Bioinformatics]]

* [[Brain–machine interface]]s

* [[Cheminformatics]]

* [[Citizen science]]

* [[Network simulation|Computer networks]]

* [[Computer vision]]

* [[Credit-card fraud]] detection

* [[Data quality]]

* [[DNA sequence]] classification

* [[Computational economics|Economics]]

* [[Financial market]] analysis <ref>Machine learning is included in the [[Chartered_Financial_Analyst_(CFA)#Curriculum|CFA Curriculum]] (discussion is top down); see: [https://www.cfainstitute.org/-/media/documents/study-session/2020-l2-ss3.ashx Kathleen DeRose and Christophe Le Lanno (2020). "Machine Learning"].</ref>

* [[General game playing]]

* [[Handwriting recognition]]

* [[Information retrieval]]

* [[Insurance]]

* [[Internet fraud]] detection

* [[Computational linguistics|Linguistics]]

* [[Machine learning control]]

* [[Machine perception]]

* [[Machine translation]]

* [[Marketing]]

* [[Automated medical diagnosis|Medical diagnosis]]

* [[Natural language processing]]

* [[Natural language understanding]]

* [[Online advertising]]

* [[Mathematical optimization|Optimization]]

* [[Recommender system]]s

* [[Robot locomotion]]

* [[Search engines]]

* [[Sentiment analysis]]

* [[Sequence mining]]

* [[Software engineering]]

* [[Speech recognition]]

* [[Structural health monitoring]]

* [[Syntactic pattern recognition]]

* [[Telecommunication]]

* [[Automated theorem proving|Theorem proving]]

* [[Time series|Time series forecasting]]

* [[User behavior analytics]]

{{div col end}}

In 2006, the media-services provider [[Netflix]] held the first "[[Netflix Prize]]" competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from [[AT&T Labs]]-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an [[Ensemble Averaging|ensemble model]] to win the Grand Prize in 2009 for $1 million.<ref>[https://web.archive.org/web/20151110062742/http://www2.research.att.com/~volinsky/netflix/ "BelKor Home Page"] research.att.com</ref> Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly.<ref>{{cite web|url=http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html|title=The Netflix Tech Blog: Netflix Recommendations: Beyond the 5 stars (Part 1)|accessdate=8 August 2015|date=2012-04-06|archive-url=https://web.archive.org/web/20160531002916/http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html|archive-date=31 May 2016|url-status=dead}}</ref> In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis.<ref>{{cite web|url=https://www.wsj.com/articles/SB10001424052748703834604575365310813948080|title=Letting the Machines Decide|author=Scott Patterson|date=13 July 2010|publisher=[[The Wall Street Journal]]|accessdate=24 June 2018}}</ref> In 2012, co-founder of [[Sun Microsystems]], [[Vinod Khosla]], predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software.<ref>{{cite web|url=https://techcrunch.com/2012/01/10/doctors-or-algorithms/|author=Vinod Khosla|publisher=Tech Crunch|title=Do We Need Doctors or Algorithms?|date=January 10, 2012}}</ref> In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings, and that it may have revealed previously unrecognized influences among artists.<ref>[https://medium.com/the-physics-arxiv-blog/when-a-machine-learning-algorithm-studied-fine-art-paintings-it-saw-things-art-historians-had-never-b8e4e7bf7d3e When A Machine Learning Algorithm Studied Fine Art Paintings, It Saw Things Art Historians Had Never Noticed], ''The Physics at [[ArXiv]] blog''</ref> In 2019 [[Springer Nature]] published the first research book created using machine learning.<ref>{{Cite web|url=https://www.theverge.com/2019/4/10/18304558/ai-writing-academic-research-book-springer-nature-artificial-intelligence|title=The first AI-generated textbook shows what robot writers are actually good at|last=Vincent|first=James|date=2019-04-10|website=The Verge|access-date=2019-05-05}}</ref>

In 2006, the media-services provider Netflix held the first "Netflix Prize" competition to find a program to better predict user preferences and improve the accuracy on its existing Cinematch movie recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble model to win the Grand Prize in 2009 for $1 million. Shortly after the prize was awarded, Netflix realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a recommendation") and they changed their recommendation engine accordingly. In 2010 The Wall Street Journal wrote about the firm Rebellion Research and their use of machine learning to predict the financial crisis. In 2012, co-founder of Sun Microsystems, Vinod Khosla, predicted that 80% of medical doctors' jobs would be lost in the next two decades to automated machine learning medical diagnostic software. In 2014, it was reported that a machine learning algorithm had been applied in the field of art history to study fine art paintings, and that it may have revealed previously unrecognized influences among artists. In 2019 Springer Nature published the first research book created using machine learning.

2006年，媒体服务提供商 Netflix 举办了首届“ Netflix 大奖”竞赛，目的是找到一个能更好地预测用户偏好的程序，并将其现有的 Cinematch 电影推荐算法的准确性提高至少10% 。由 at & t 实验室的研究人员组成的联合团队与 Big Chaos 和 Pragmatic Theory 团队合作建立了一个集成模型，赢得了2009年的一百万美元大奖。在该奖项颁发后不久，Netflix 意识到观众的收视率并不是他们观看模式的最佳指标(“一切都是推荐”) ，于是他们相应地改变了自己的推荐引擎。2010年，《华尔街日报》报道了 Rebellion Research 公司及其利用机器学习预测金融危机的情况。2012年，昇阳电脑的联合创始人 Vinod Khosla 预测，在未来的20年里，80% 的医生的工作将会因为自动化的机器学习医疗诊断软件而流失。2014年，据报道，一种机器学习算法已应用于艺术史领域，用于研究美术绘画，它可能揭示了艺术家之间以前未被认识到的影响。2019年，施普林格 · 自然出版了第一本利用机器学习进行研究的书。

== Limitations ==

Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results.<ref>{{Cite news|url=https://www.bloomberg.com/news/articles/2016-11-10/why-machine-learning-models-often-fail-to-learn-quicktake-q-a|title=Why Machine Learning Models Often Fail to Learn: QuickTake Q&A|date=2016-11-10|work=Bloomberg.com|access-date=2017-04-10|archive-url=https://web.archive.org/web/20170320225010/https://www.bloomberg.com/news/articles/2016-11-10/why-machine-learning-models-often-fail-to-learn-quicktake-q-a|archive-date=2017-03-20}}</ref><ref>{{Cite news|url=https://hbr.org/2017/04/the-first-wave-of-corporate-ai-is-doomed-to-fail|title=The First Wave of Corporate AI Is Doomed to Fail|date=2017-04-18|work=Harvard Business Review|access-date=2018-08-20}}</ref><ref>{{Cite news|url=https://venturebeat.com/2016/09/17/why-the-a-i-euphoria-is-doomed-to-fail/|title=Why the A.I. euphoria is doomed to fail|date=2016-09-18|work=VentureBeat|access-date=2018-08-20|language=en-US}}</ref> Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.<ref>{{Cite web|url=https://www.kdnuggets.com/2018/07/why-machine-learning-project-fail.html|title=9 Reasons why your machine learning project will fail|website=www.kdnuggets.com|language=en-US|access-date=2018-08-20}}</ref>

Although machine learning has been transformative in some fields, machine-learning programs often fail to deliver expected results. Reasons for this are numerous: lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks and algorithms, wrong tools and people, lack of resources, and evaluation problems.

尽管机器学习在某些领域具有革命性的作用，但机器学习程序往往无法交付预期的结果。造成这种情况的原因有很多: 缺乏(合适的)数据、缺乏对数据的访问、数据偏见、隐私问题、选择不当的任务和算法、错误的工具和人员、缺乏资源以及评估问题。

In 2018, a self-driving car from [[Uber]] failed to detect a pedestrian, who was killed after a collision.<ref>{{Cite news|url=https://www.economist.com/the-economist-explains/2018/05/29/why-ubers-self-driving-car-killed-a-pedestrian|title=Why Uber's self-driving car killed a pedestrian|work=The Economist|access-date=2018-08-20|language=en}}</ref> Attempts to use machine learning in healthcare with the [[Watson (computer)|IBM Watson]] system failed to deliver even after years of time and billions of investment.<ref>{{Cite news|url=https://www.statnews.com/2018/07/25/ibm-watson-recommended-unsafe-incorrect-treatments/|title=IBM's Watson recommended 'unsafe and incorrect' cancer treatments - STAT|date=2018-07-25|work=STAT|access-date=2018-08-21|language=en-US}}</ref><ref>{{Cite news|url=https://www.wsj.com/articles/ibm-bet-billions-that-watson-could-improve-cancer-treatment-it-hasnt-worked-1533961147|title=IBM Has a Watson Dilemma|last=Hernandez|first=Daniela|date=2018-08-11|work=Wall Street Journal|access-date=2018-08-21|last2=Greenwald|first2=Ted|language=en-US|issn=0099-9660}}</ref>

In 2018, a self-driving car from Uber failed to detect a pedestrian, who was killed after a collision. Attempts to use machine learning in healthcare with the IBM Watson system failed to deliver even after years of time and billions of investment.

2018年，优步的一名自动驾驶汽车司机未能检测到一名行人，他在一次碰撞事故中丧生。在医疗保健中使用 IBM Watson 系统的机器学习的尝试，即使经过多年的时间和数十亿的投资也未能实现。

===Bias===

{{main|Algorithmic bias}}

Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society.<ref>{{Cite journal|last=Garcia|first=Megan|date=2016|title=Racist in the Machine|journal=World Policy Journal|language=en|volume=33|issue=4|pages=111–117|doi=10.1215/07402775-3813015|issn=0740-2775|url=https://www.semanticscholar.org/paper/eeafa41f48e8f5be764be20db8260609a49381fa}}</ref> Language models learned from data have been shown to contain human-like biases.<ref>{{Cite journal|last=Caliskan|first=Aylin|last2=Bryson|first2=Joanna J.|last3=Narayanan|first3=Arvind|date=2017-04-14|title=Semantics derived automatically from language corpora contain human-like biases|journal=Science|language=en|volume=356|issue=6334|pages=183–186|doi=10.1126/science.aal4230|issn=0036-8075|pmid=28408601|bibcode=2017Sci...356..183C|arxiv=1608.07187}}</ref><ref>{{Citation|last=Wang|first=Xinan|title=An algorithm for L1 nearest neighbor search via monotonic embedding|date=2016|url=http://papers.nips.cc/paper/6227-an-algorithm-for-l1-nearest-neighbor-search-via-monotonic-embedding.pdf|work=Advances in Neural Information Processing Systems 29|pages=983–991|editor-last=Lee|editor-first=D. D.|publisher=Curran Associates, Inc.|access-date=2018-08-20|last2=Dasgupta|first2=Sanjoy|editor2-last=Sugiyama|editor2-first=M.|editor3-last=Luxburg|editor3-first=U. V.|editor4-last=Guyon|editor4-first=I.}}</ref> Machine learning systems used for criminal risk assessment have been found to be biased against black people.<ref>{{Cite web|url=https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing|title=Machine Bias|date=2016-05-23|website=[[ProPublica]]|language=|author1=Julia Angwin |author2=Jeff Larson |author3=Lauren Kirchner |author4=Surya Mattu|archive-url=|archive-date=|url-status=|access-date=2018-08-20}}</ref><ref>{{Cite news|url=https://www.nytimes.com/2017/10/26/opinion/algorithm-compas-sentencing-bias.html|title=Opinion {{!}} When an Algorithm Helps Send You to Prison|last=|first=|date=|work=[[New York Times]]|access-date=2018-08-20|language=en}}</ref> In 2015, Google photos would often tag black people as gorillas,<ref>{{Cite news|url=https://www.bbc.co.uk/news/technology-33347866|title=Google apologises for racist blunder|date=2015-07-01|work=BBC News|access-date=2018-08-20|language=en-GB}}</ref> and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all.<ref>{{Cite news|url=https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai|title=Google 'fixed' its racist algorithm by removing gorillas from its image-labeling tech|work=The Verge|access-date=2018-08-20}}</ref> Similar issues with recognizing non-white people have been found in many other systems.<ref>{{Cite news|url=https://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html|title=Opinion {{!}} Artificial Intelligence's White Guy Problem|last=|first=|date=|work=[[New York Times]]|access-date=2018-08-20|language=en}}</ref> In 2016, Microsoft tested a [[chatbot]] that learned from Twitter, and it quickly picked up racist and sexist language.<ref>{{Cite news|url=https://www.technologyreview.com/s/601111/why-microsoft-accidentally-unleashed-a-neo-nazi-sexbot/|title=Why Microsoft's teen chatbot, Tay, said lots of awful things online|last=Metz|first=Rachel|work=MIT Technology Review|access-date=2018-08-20|language=en}}</ref> Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains.<ref>{{Cite news|url=https://www.technologyreview.com/s/603944/microsoft-ai-isnt-yet-adaptable-enough-to-help-businesses/|title=Microsoft says its racist chatbot illustrates how AI isn't adaptable enough to help most businesses|last=Simonite|first=Tom|work=MIT Technology Review|access-date=2018-08-20|language=en}}</ref> Concern for [[Fairness (machine learning)|fairness]] in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including [[Fei-Fei Li]], who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”<ref>{{Cite news|url=https://www.wired.com/story/fei-fei-li-artificial-intelligence-humanity/|title=Fei-Fei Li's Quest to Make Machines Better for Humanity|last=Hempel|first=Jessi|date=2018-11-13|work=Wired|access-date=2019-02-17|issn=1059-1028}}</ref>

Machine learning approaches in particular can suffer from different data biases. A machine learning system trained on current customers only may not be able to predict the needs of new customer groups that are not represented in the training data. When trained on man-made data, machine learning is likely to pick up the same constitutional and unconscious biases already present in society. Language models learned from data have been shown to contain human-like biases. Machine learning systems used for criminal risk assessment have been found to be biased against black people. In 2015, Google photos would often tag black people as gorillas, and in 2018 this still was not well resolved, but Google reportedly was still using the workaround to remove all gorillas from the training data, and thus was not able to recognize real gorillas at all. Similar issues with recognizing non-white people have been found in many other systems. In 2016, Microsoft tested a chatbot that learned from Twitter, and it quickly picked up racist and sexist language. Because of such challenges, the effective use of machine learning may take longer to be adopted in other domains. Concern for fairness in machine learning, that is, reducing bias in machine learning and propelling its use for human good is increasingly expressed by artificial intelligence scientists, including Fei-Fei Li, who reminds engineers that "There’s nothing artificial about AI...It’s inspired by people, it’s created by people, and—most importantly—it impacts people. It is a powerful tool we are only just beginning to understand, and that is a profound responsibility.”

特别是机器学习方法可能会受到不同数据偏差的影响。仅针对当前客户的机器学习系统可能无法预测培训数据中没有表示的新客户群体的需求。当机器学习接受人造数据的训练时，很可能会挑选出社会中已经存在的同样的宪法和无意识的偏见。从数据中学到的语言模型已经被证明包含了类似人类的偏见。用于犯罪风险评估的机器学习系统被发现对黑人有偏见。在2015年，谷歌照片经常把黑人标记为大猩猩，而在2018年，这个问题仍然没有得到很好的解决，但据报道，谷歌仍然在使用变通方法从训练数据中删除所有大猩猩，因此根本无法识别真正的大猩猩。在许多其他系统中也发现了识别非白人的类似问题。2016年，微软测试了一个从 Twitter 上学来的聊天机器人，它很快就学会了种族主义和性别歧视的语言。由于这些挑战，机器学习的有效应用可能需要更长的时间才能被其他领域采用。人工智能科学家越来越关注机器学习中的公平问题，即减少机器学习中的偏见，推动机器学习为人类的利益服务。李等人提醒工程师，“人工智能没有任何人为的东西... ... 它受到人的启发，由人创造，最重要的是，它会影响人。”。这是一个强大的工具，我们才刚刚开始理解，这是一项意义深远的责任。”

== Model assessments ==

Classification machine learning models can be validated by accuracy estimation techniques like the [[Test set|Holdout]] method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-[[Cross-validation (statistics)|cross-validation]] method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, [[Bootstrapping|bootstrap]], which samples n instances with replacement from the dataset, can be used to assess model accuracy.<ref>{{cite journal|last1=Kohavi|first1=Ron|title=A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection|journal=International Joint Conference on Artificial Intelligence|date=1995|url=http://web.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf}}</ref>

Classification machine learning models can be validated by accuracy estimation techniques like the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set and 1/3 test set designation) and evaluates the performance of the training model on the test set. In comparison, the K-fold-cross-validation method randomly partitions the data into K subsets and then K experiments are performed each respectively considering 1 subset for evaluation and the remaining K-1 subsets for training the model. In addition to the holdout and cross-validation methods, bootstrap, which samples n instances with replacement from the dataset, can be used to assess model accuracy.

分类机器学习模型可以通过精度估计技术进行验证，如 Holdout 方法，该方法将数据分成训练集和测试集(传统的2 / 3训练集和1 / 3测试集指定) ，并评估训练模型在测试集上的性能。对比分析表明，k 折叠交叉验证方法将数据随机划分为 k 子集，然后分别考虑评价子集和训练模型的剩余 K-1子集进行 k 实验。除了拒绝法和交叉验证方法，bootstrap 可以用来评估模型的准确性，它从数据集中取样 n 个实例并进行替换。

In addition to overall accuracy, investigators frequently report [[sensitivity and specificity]] meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the [[False Positive Rate]] (FPR) as well as the [[False Negative Rate]] (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The [[Total Operating Characteristic]] (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used [[Receiver Operating Characteristic]] (ROC) and ROC's associated Area Under the Curve (AUC).<ref>{{cite journal|last1=Pontius|first1=Robert Gilmore|last2=Si|first2=Kangping|title=The total operating characteristic to measure diagnostic ability for multiple thresholds| journal=International Journal of Geographical Information Science|volume=28|issue=3|year=2014|pages=570–583|doi=10.1080/13658816.2013.862623}}</ref>

In addition to overall accuracy, investigators frequently report sensitivity and specificity meaning True Positive Rate (TPR) and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the False Positive Rate (FPR) as well as the False Negative Rate (FNR). However, these rates are ratios that fail to reveal their numerators and denominators. The Total Operating Characteristic (TOC) is an effective method to express a model's diagnostic ability. TOC shows the numerators and denominators of the previously mentioned rates, thus TOC provides more information than the commonly used Receiver Operating Characteristic (ROC) and ROC's associated Area Under the Curve (AUC).

除了整体的准确性，调查人员经常报告灵敏度和特异度，分别意味着真正阳性率(TPR)和真正阴性率(TNR)。同样，研究人员有时报告假阳性率(FPR)以及假阴性率(FNR)。然而，这些比率没有显示出它们的分子和分母。总操作特征是表达模型诊断能力的有效方法。目标为本课程显示上述比率的分子及分母，因此，目标为本课程提供的资料较常用的 ROC曲线(ROC)及 ROC 相关的曲线下面积(AUC)为多。

== Ethics ==

Machine learning poses a host of [[Machine ethics|ethical questions]]. Systems which are trained on datasets collected with biases may exhibit these biases upon use ([[algorithmic bias]]), thus digitizing cultural prejudices.<ref>{{Cite web|url=http://www.nickbostrom.com/ethics/artificial-intelligence.pdf|title=The Ethics of Artificial Intelligence|last=Bostrom|first=Nick|date=2011|website=|access-date=11 April 2016|archive-url=https://web.archive.org/web/20160304015020/http://www.nickbostrom.com/ethics/artificial-intelligence.pdf|archive-date=4 March 2016|url-status=dead}}</ref> For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants.<ref name="Edionwe Outline">{{cite web|last1=Edionwe|first1=Tolulope|title=The fight against racist algorithms|url=https://theoutline.com/post/1571/the-fight-against-racist-algorithms|website=The Outline|accessdate=17 November 2017}}</ref><ref name="Jeffries Outline">{{cite web|last1=Jeffries|first1=Adrianne|title=Machine learning is racist because the internet is racist|url=https://theoutline.com/post/1439/machine-learning-is-racist-because-the-internet-is-racist|website=The Outline|accessdate=17 November 2017}}</ref> Responsible [[Data collection|collection of data]] and documentation of algorithmic rules used by a system thus is a critical part of machine learning.

Machine learning poses a host of ethical questions. Systems which are trained on datasets collected with biases may exhibit these biases upon use (algorithmic bias), thus digitizing cultural prejudices. For example, using job hiring data from a firm with racist hiring policies may lead to a machine learning system duplicating the bias by scoring job applicants against similarity to previous successful applicants. Responsible collection of data and documentation of algorithmic rules used by a system thus is a critical part of machine learning.

机器学习引发了一系列伦理问题。经过数据收集和偏见训练的系统在使用时可能会表现出这些偏见(算法偏见) ，从而将文化偏见数字化。例如，使用具有种族主义招聘政策的公司的招聘数据，可能会导致机器学习系统重复这种偏见，对应聘者进行打分，以对照先前成功应聘者的相似程度。因此，负责任地收集系统使用的算法规则的数据和文档是机器学习的关键部分。

Because human languages contain biases, machines trained on language ''[[Text corpus|corpora]]'' will necessarily also learn these biases.<ref>{{cite arXiv|eprint=1809.02208|class=cs.CY|author-link=|authors=M.O.R. Prates, P.H.C. Avelar, L.C. Lamb|title=Assessing Gender Bias in Machine Translation -- A Case Study with Google Translate|date=11 Mar 2019}}</ref><ref>{{cite web |url=https://freedom-to-tinker.com/2016/08/24/language-necessarily-contains-human-biases-and-so-will-machines-trained-on-language-corpora/ |title=Language necessarily contains human biases, and so will machines trained on language corpora |date=August 24, 2016 |first=Arvind |last=Narayanan |website=Freedom to Tinker}}</ref>

Because human languages contain biases, machines trained on language corpora will necessarily also learn these biases.

因为人类语言包含偏见，训练语言语料库的机器也必然会学习这些偏见。

Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.<ref>{{cite journal |last=Char |first=D. S. |last2=Shah |first2=N. H. |last3=Magnus |first3=D. |year=2018 |title=Implementing Machine Learning in Health Care—Addressing Ethical Challenges |journal=[[New England Journal of Medicine]] |volume=378 |issue=11 |pages=981–983 |doi=10.1056/nejmp1714229 |pmid=29539284 |pmc=5962261 }}</ref>

Other forms of ethical challenges, not related to personal biases, are more seen in health care. There are concerns among health care professionals that these systems might not be designed in the public's interest but as income-generating machines. This is especially true in the United States where there is a long-standing ethical dilemma of improving health care, but also increasing profits. For example, the algorithms could be designed to provide patients with unnecessary tests or medication in which the algorithm's proprietary owners hold stakes. There is huge potential for machine learning in health care to provide professionals a great tool to diagnose, medicate, and even plan recovery paths for patients, but this will not happen until the personal biases mentioned previously, and these "greed" biases are addressed.

其他形式的伦理挑战，与个人偏见无关，更多地出现在医疗保健领域。卫生保健专业人员担心，这些系统的设计可能不是为了公众的利益，而是作为创收机器。在美国尤其如此，美国长期存在一个道德两难境地，既要改善医疗保健，又要增加利润。例如，算法可以被设计为给病人提供不必要的测试或药物，而算法的所有者持有这些药物的股份。在医疗保健领域，机器学习有着巨大的潜力，可以为专业人员提供诊断、治疗甚至为病人规划康复路径的良好工具，但是在前面提到的个人偏见和这些“贪婪”偏见得到解决之前，这一切都不会发生。

== Software ==

[[Software suite]]s containing a variety of machine learning algorithms include the following:

Software suites containing a variety of machine learning algorithms include the following:

包含各种机器学习算法的软件包括:

=== Free and open-source software{{anchor|Open-source_software}} ===

{{Div col|colwidth=18em}}

* [[Microsoft Cognitive Toolkit|CNTK]]

* [[Deeplearning4j]]

* [[ELKI]]

* [[Keras]]

* [[Caffe (software)|Caffe]]

* [[ML.NET]]

* [[Apache Mahout|Mahout]]

* [[Mallet (software project)|Mallet]]

* [[mlpack]]

* [[MXNet]]

* [[Neural Lab]]

* [[GNU Octave]]

* [[OpenNN]]

* [[Orange (software)|Orange]]

* [[Perl Data Language]]

* [[scikit-learn]]

* [[Shogun (toolbox)|Shogun]]

* [[Apache Spark#MLlib Machine Learning Library|Spark MLlib]]

* [[Apache SystemML]]

* [[TensorFlow]]

* [[ROOT]] (TMVA with ROOT)

* [[Torch (machine learning)|Torch]] / [[PyTorch]]

* [[Weka (machine learning)|Weka]] / [[MOA (Massive Online Analysis)|MOA]]

* [[Yooreeka]]

* [[R (programming language)|R]]

{{Div col end}}

=== Proprietary software with free and open-source editions ===

{{Div col|colwidth=18em}}

* [[KNIME]]

* [[RapidMiner]]

{{Div col end}}

=== Proprietary software ===

{{Div col|colwidth=18em}}

* [[Amazon Machine Learning]]

* [[Angoss]] KnowledgeSTUDIO

* [[Azure Machine Learning]]

* [[Ayasdi]]

* [[IBM Data Science Experience]]

* [[Google APIs|Google Prediction API]]

* [[SPSS Modeler|IBM SPSS Modeler]]

* [[KXEN Inc.|KXEN Modeler]]

* [[LIONsolver]]

* [[Mathematica]]

* [[MATLAB]]

* [[Microsoft Azure]]

* [[Neural Designer]]

* [[NeuroSolutions]]

* [[Oracle Data Mining]]

* [[Oracle Cloud#Platform as a Service (PaaS)|Oracle AI Platform Cloud Service]]

* [[RCASE]]

* [[SAS (software)#Components|SAS Enterprise Miner]]

* [[SequenceL]]

* [[Splunk]]

* [[STATISTICA]] Data Miner

{{Div col end}}

== Journals ==

* ''[[Journal of Machine Learning Research]]''

* [[Machine Learning (journal)|''Machine Learning'']]

* ''[[Nature Machine Intelligence]]''

* [[Neural Computation (journal)|''Neural Computation'']]

== Conferences ==

* [[Conference on Neural Information Processing Systems]]

* [[International Conference on Machine Learning]]

== See also ==

{{columns-list|

{{columns-list|

{ columns-list |

* {{annotated link|Automated machine learning}}

* {{annotated link|Big data}}

* {{annotated link|Explanation-based learning}}

* {{annotated link|List of important publications in computer science#Machine learning|Important publications in machine learning}}

* {{annotated link|List of datasets for machine learning research}}

* {{annotated link|Predictive analytics}}

* {{annotated link|Quantum machine learning}}

* {{annotated link|Machine learning in bioinformatics|Machine-learning applications in bioinformatics}}

* {{annotated link|Seq2seq}}

* {{annotated link|Fairness (machine learning)}}}}

== References ==

{{Reflist|30em}}

== Further reading ==

{{Refbegin|2}}

* Nils J. Nilsson, ''[https://ai.stanford.edu/people/nilsson/mlbook.html Introduction to Machine Learning]''.

* [[Trevor Hastie]], [[Robert Tibshirani]] and [[Jerome H. Friedman]] (2001). ''[https://web.stanford.edu/~hastie/ElemStatLearn/ The Elements of Statistical Learning]'', Springer. {{ISBN|0-387-95284-5}}.

* [[Pedro Domingos]] (September 2015), ''[[The Master Algorithm]]'', Basic Books, {{ISBN|978-0-465-06570-7}}

* Ian H. Witten and Eibe Frank (2011). ''Data Mining: Practical machine learning tools and techniques'' Morgan Kaufmann, 664pp., {{ISBN|978-0-12-374856-0}}.

* Ethem Alpaydin (2004). ''Introduction to Machine Learning'', MIT Press, {{ISBN|978-0-262-01243-0}}.

* [[David J. C. MacKay]]. ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'' Cambridge: Cambridge University Press, 2003. {{ISBN|0-521-64298-1}}

* [[Richard O. Duda]], [[Peter E. Hart]], David G. Stork (2001) ''Pattern classification'' (2nd edition), Wiley, New York, {{ISBN|0-471-05669-3}}.

* [[Christopher Bishop]] (1995). ''Neural Networks for Pattern Recognition'', Oxford University Press. {{ISBN|0-19-853864-2}}.

* Stuart Russell & Peter Norvig, (2009). ''[http://aima.cs.berkeley.edu/ Artificial Intelligence – A Modern Approach]''. Pearson, {{ISBN|9789332543515}}.

* [[Ray Solomonoff]], ''An Inductive Inference Machine'', IRE Convention Record, Section on Information Theory, Part 2, pp., 56–62, 1957.

* [[Ray Solomonoff]], ''[http://world.std.com/~rjs/indinf56.pdf An Inductive Inference Machine]'' A privately circulated report from the 1956 [[Dartmouth workshop|Dartmouth Summer Research Conference on AI]].

{{Refend}}

==External links==

{{Commons category}}

*[https://web.archive.org/web/20171230081341/http://machinelearning.org/ International Machine Learning Society]

*[https://mloss.org/ mloss] is an academic database of open-source machine learning software.

*[https://developers.google.com/machine-learning/crash-course/ Machine Learning Crash Course] by [[Google]]. This is a free course on machine learning through the use of [[TensorFlow]].

{{Computer science}}

[[Category:Machine learning| ]]

[[Category:Cybernetics]]

Category:Cybernetics

类别: 控制论

[[Category:Learning]]

Category:Learning

类别: 学习

<noinclude>

This page was moved from [[wikipedia:en:Machine learning]]. Its edit history can be viewed at [[机器学习/edithistory]]</noinclude>

[[Category:待整理页面]]

Moonscar

1,569

个编辑

更改

机器学习 (查看源代码)

2020年5月12日 (二) 17:54的版本

导航菜单

搜索