更改

机器学习 Machine Learning (查看源代码)

2020年7月23日 (四) 08:45的版本

添加83,037字节、 2020年7月23日 (四) 08:45

第344行：第344行：

:''主文章：[https://en.wikipedia.org/wiki/Learning_classifier_system 学习分类器系统]''

学习分类器系统(LCS)是一组[https://en.wikipedia.org/wiki/Rule-based_machine_learning 基于规则的机器学习]算法，它将发现组件(通常是[https://en.wikipedia.org/wiki/Genetic_algorithm 遗传算法])与学习组件(执行有[https://en.wikipedia.org/wiki/Supervised_learning 监督学习]、[https://en.wikipedia.org/wiki/Reinforcement_learning 强化学习]或[https://en.wikipedia.org/wiki/Unsupervised_learning 无监督学习])结合起来。他们试图找出一套与情境相关的规则，这些规则以一种[https://en.wikipedia.org/wiki/Piecewise 分段]的方式，集体存储和应用知识，以便进行预测<ref>{{Cite journal|last=Urbanowicz|first=Ryan J.|last2=Moore|first2=Jason H.|date=2009-09-22|title=Learning Classifier Systems: A Complete Introduction, Review, and Roadmap|url=http://www.hindawi.com/archive/2009/736398/|journal=Journal of Artificial Evolution and Applications|language=en|volume=2009|pages=1–25|issn:1687-6229}}</ref>。

+

=== 学习算法的分类 Types of learning algorithms ===

+

The types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve.

+

The types of machine learning algorithms differ in their approach, the type of data they input and output, and the type of task or problem that they are intended to solve.

+

不同类型的机器学习算法的方法、输入和输出的数据类型以及它们要解决的任务或问题的类型都有所不同。

+

==== 监督学习 Supervised learning ====

+

[[File:Svm max sep hyperplane with margin.png|thumb|A [[support vector machine]] is a supervised learning model that divides the data into regions separated by a [[linear classifier|linear boundary]]. Here, the linear boundary divides the black circles from the white.]]

+

A [[support vector machine is a supervised learning model that divides the data into regions separated by a linear boundary. Here, the linear boundary divides the black circles from the white.]]

+

支持向量机是一个有监督学习模型，它将数据划分为由线性边界分隔的区域。在这里，有一个线性边界可以将黑色圆圈和白色圆圈分开。]

+

Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs.<ref>{{cite book |last1=Russell |first1=Stuart J. |last2=Norvig |first2=Peter |title=Artificial Intelligence: A Modern Approach |date=2010 |publisher=Prentice Hall |isbn=9780136042594 |edition=Third|title-link=Artificial Intelligence: A Modern Approach }}</ref> The data is known as [[training data]], and consists of a set of training examples. Each training example has one or more inputs and the desired output, also known as a supervisory signal. In the mathematical model, each training example is represented by an [[array data structure|array]] or vector, sometimes called a feature vector, and the training data is represented by a [[Matrix (mathematics)|matrix]]. Through iterative optimization of an [[Loss function|objective function]], supervised learning algorithms learn a function that can be used to predict the output associated with new inputs.<ref>{{cite book |last1=Mohri |first1=Mehryar |last2=Rostamizadeh |first2=Afshin |last3=Talwalkar |first3=Ameet |title=Foundations of Machine Learning |date=2012 |publisher=The MIT Press |isbn=9780262018258}}</ref> An optimal function will allow the algorithm to correctly determine the output for inputs that were not a part of the training data. An algorithm that improves the accuracy of its outputs or predictions over time is said to have learned to perform that task.<ref name="Mitchell-1997" />

+

Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs. The data is known as training data, and consists of a set of training examples. Each training example has one or more inputs and the desired output, also known as a supervisory signal. In the mathematical model, each training example is represented by an array or vector, sometimes called a feature vector, and the training data is represented by a matrix. Through iterative optimization of an objective function, supervised learning algorithms learn a function that can be used to predict the output associated with new inputs. An optimal function will allow the algorithm to correctly determine the output for inputs that were not a part of the training data. An algorithm that improves the accuracy of its outputs or predictions over time is said to have learned to perform that task.

+

有监督学习算法会建立一个包含输入和期望输出的数据集上的的数学模型。这些数据被称为训练数据，由一组组训练样本组成。每个训练样本都有一个或多个输入和期望的输出，也称为监督信号。在数学模型中，每个训练样本由一个数组或向量表示，有时也称为'''特征向量 Feature Vector'''，训练数据由一个矩阵表示。通过对目标函数的迭代优化，监督式学习算法可以学习到一个用来预测与新输入相关的输出的函数。一个达到最优的目标函数可以实现算法对未知输入的输出结果有正确的预判，这种正确的预判并不仅限于训练数据上（即模型具有良好的泛化能力）。随着时间的推移，提高输出或预测精度的算法被称为已学会执行该任务。

+

Types of supervised learning algorithms include [[Active learning (machine learning)|Active learning]] , [[Statistical classification|classification]] and [[Regression analysis|regression]].<ref>{{cite book|last=Alpaydin|first=Ethem|title=Introduction to Machine Learning|date=2010|publisher=MIT Press|isbn=978-0-262-01243-0|page=9|url=https://books.google.com/books?id=7f5bBAAAQBAJ&printsec=frontcover#v=onepage&q=classification&f=false}}</ref> Classification algorithms are used when the outputs are restricted to a limited set of values, and regression algorithms are used when the outputs may have any numerical value within a range. As an example, for a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email.

+

Types of supervised learning algorithms include Active learning , classification and regression. Classification algorithms are used when the outputs are restricted to a limited set of values, and regression algorithms are used when the outputs may have any numerical value within a range. As an example, for a classification algorithm that filters emails, the input would be an incoming email, and the output would be the name of the folder in which to file the email.

+

监督式学习算法的类型包括'''主动学习 Active Learning'''、'''分类 Classification'''和'''回归 Regression'''。当输出被限制在一个有限的值集内时使用分类算法，当输出在一个范围内可能有任何数值时使用回归算法。例如，对于过滤电子邮件的分类算法，输入将是一封收到的电子邮件，输出将是用于将电子邮件归档的文件夹的名称。

+

[[Similarity learning]] is an area of supervised machine learning closely related to regression and classification, but the goal is to learn from examples using a similarity function that measures how similar or related two objects are. It has applications in [[ranking]], [[recommendation systems]], visual identity tracking, face verification, and speaker verification.

+

Similarity learning is an area of supervised machine learning closely related to regression and classification, but the goal is to learn from examples using a similarity function that measures how similar or related two objects are. It has applications in ranking, recommendation systems, visual identity tracking, face verification, and speaker verification.

+

'''相似性学习 Similarity Learning'''是监督学习领域中与回归和分类密切相关的一个领域，但其目标是从实例中学习如何通过使用相似性函数来衡量两个对象之间的相似程度。它在排名、推荐系统、视觉身份跟踪、人脸验证和'''语者验证 Speaker Verification'''等方面都有应用。

+

==== 无监督学习 Unsupervised learning ====

+

Unsupervised learning algorithms take a set of data that contains only inputs, and find structure in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning algorithms identify commonalities in the data and react based on the presence or absence of such commonalities in each new piece of data. A central application of unsupervised learning is in the field of [[density estimation]] in [[statistics]], such as finding the [[probability density function]].<ref name="JordanBishop2004">{{cite book |first1=Michael I. |last1=Jordan |first2=Christopher M. |last2=Bishop |chapter=Neural Networks |editor=Allen B. Tucker |title=Computer Science Handbook, Second Edition (Section VII: Intelligent Systems) |location=Boca Raton, Florida |publisher=Chapman & Hall/CRC Press LLC |year=2004 |isbn=978-1-58488-360-9 }}</ref> Though unsupervised learning encompasses other domains involving summarizing and explaining data features.

+

Unsupervised learning algorithms take a set of data that contains only inputs, and find structure in the data, like grouping or clustering of data points. The algorithms, therefore, learn from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning algorithms identify commonalities in the data and react based on the presence or absence of such commonalities in each new piece of data. A central application of unsupervised learning is in the field of density estimation in statistics, such as finding the probability density function. Though unsupervised learning encompasses other domains involving summarizing and explaining data features.

+

'''无监督学习 Unsupervised Learning'''算法只需要一组只包含输入的数据，通过寻找数据中潜在结构、规律，对数据点进行分组或聚类。因此，算法是从未被标记、分类或分类的测试数据中学习，而不是通过响应反馈来改进策略。无监督式学习算法可以识别数据中的共性，并根据每个新数据中是否存在这些共性而做出反应。无监督学习的一个核心应用是统计学中的密度估计领域，比如寻找概率密度函数。尽管非监督式学习也包含了其他领域，如总结和解释数据特性。

+

Cluster analysis is the assignment of a set of observations into subsets (called ''clusters'') so that observations within the same cluster are similar according to one or more predesignated criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some ''similarity metric'' and evaluated, for example, by ''internal compactness'', or the similarity between members of the same cluster, and ''separation'', the difference between clusters. Other methods are based on ''estimated density'' and ''graph connectivity''.

+

Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to one or more predesignated criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated, for example, by internal compactness, or the similarity between members of the same cluster, and separation, the difference between clusters. Other methods are based on estimated density and graph connectivity.

+

'''聚类分析 Cluster Analysis'''是将一组观测值分配到一个子集（称为集群）中，这样同一个集群中的观测值就可以根据一个或多个预先指定的相似数据点来给定，而从不同的集群中提取的观测值就不一样了。不同的聚类技术对数据的结构会做出不同的假设，通常用一些相似度量来进行定义和评估，例如，通过内部紧凑性，或同一集群成员之间的相似性，以及分离，集群之间的差异''（这里的翻译有待改进）''。也有其他方法是基于密度估计和图连通性来进行相似性度量。

+

==== 半监督学习 Semi-supervised learning ====

+

Semi-supervised learning falls between [[unsupervised learning]] (without any labeled training data) and [[supervised learning]] (with completely labeled training data). Some of the training examples are missing training labels, yet many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce a considerable improvement in learning accuracy.

+

Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Some of the training examples are missing training labels, yet many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce a considerable improvement in learning accuracy.

+

'''半监督学习 Semi-supervised Learning'''介于无监督式学习（没有任何标记的训练数据）和有监督学习（完全标记的训练数据）之间。有些训练样本缺少训练标签，但许多机器学习研究人员发现，如果将未标记的数据与少量标记的数据结合使用，可以大大提高学习的准确性。

+

In [[Weak supervision|weakly supervised learning]], the training labels are noisy, limited, or imprecise; however, these labels are often cheaper to obtain, resulting in larger effective training sets.<ref>{{Cite web|url=https://hazyresearch.github.io/snorkel/blog/ws_blog_post.html|title=Weak Supervision: The New Programming Paradigm for Machine Learning|author1=Alex Ratner |author2=Stephen Bach |author3=Paroma Varma |author4=Chris |others= referencing work by many other members of Hazy Research|website=hazyresearch.github.io|access-date=2019-06-06}}</ref>

+

In weakly supervised learning, the training labels are noisy, limited, or imprecise; however, these labels are often cheaper to obtain, resulting in larger effective training sets.

+

在'''弱监督学习 Weak Supervision'''中，训练标签是有噪声的、有限的或不精确的; 然而，这些标签使用起来往往更加“实惠”——这种数据更容易得到、更容易拥有更大的有效训练集。

+

==== 强化学习 Reinforcement learning ====

+

Reinforcement learning is an area of machine learning concerned with how [[software agent]]s ought to take [[Action selection|actions]] in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as [[game theory]], [[control theory]], [[operations research]], [[information theory]], [[simulation-based optimization]], [[multi-agent system]]s, [[swarm intelligence]], [[statistics]] and [[genetic algorithm]]s. In machine learning, the environment is typically represented as a [[Markov Decision Process]] (MDP). Many reinforcement learning algorithms use [[dynamic programming]] techniques.<ref>{{Cite book|title=Reinforcement learning and markov decision processes|author1=van Otterlo, M.|author2=Wiering, M.|journal=Reinforcement Learning |volume=12|pages=3–42 |year=2012 |doi=10.1007/978-3-642-27645-3_1|series=Adaptation, Learning, and Optimization|isbn=978-3-642-27644-6}}</ref> Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.

+

Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Due to its generality, the field is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is typically represented as a Markov Decision Process (MDP). Many reinforcement learning algorithms use dynamic programming techniques. Reinforcement learning algorithms do not assume knowledge of an exact mathematical model of the MDP, and are used when exact models are infeasible. Reinforcement learning algorithms are used in autonomous vehicles or in learning to play a game against a human opponent.

+

强化学习是机器学习的一个领域，它研究软件组件应该如何在某个环境中进行行动决策，以便最大化某种累积收益的概念。由于其存在的普遍性，该领域的研究在许多其他学科，如'''博弈论 Game Theory'''，'''控制理论 Control Theory'''，'''运筹学 Operations Research'''，'''信息论 Information Theory'''，'''基于仿真的优化 Simulation-based Optimization'''，'''多主体系统 Multi-agent System'''，'''群体智能 Swarm Intelligence'''，'''统计学 Statistics'''和'''遗传算法 Genetic Algorithm'''。在机器学习中，环境通常被表示为'''马可夫决策过程 Markov Decision Process ，MDP'''。许多强化学习算法使用动态编程技术。强化学习算法不需要知道 MDP 的精确数学模型，而是在精确模型不可行的情况下使用。强化学习算法常用于车辆自动驾驶问题或人机游戏场景。

+

==== 自学习 Self learning ====

+

Self-learning as machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning named Crossbar Adaptive Array (CAA). <ref> Bozinovski, S. (1982). "A self-learning system using secondary reinforcement" . In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North Holland. pp. 397–402. {{ISBN|978-0-444-86488-8}}.</ref> It is a learning with no external rewards and no external teacher advices. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion. <ref>Bozinovski, Stevo (2014) "Modeling mechanisms of cognition-emotion interaction in artificial neural networks, since 1981." Procedia Computer Science p. 255-263 </ref>

+

Self-learning as machine learning paradigm was introduced in 1982 along with a neural network capable of self-learning named Crossbar Adaptive Array (CAA). It is a learning with no external rewards and no external teacher advices. The CAA self-learning algorithm computes, in a crossbar fashion, both decisions about actions and emotions (feelings) about consequence situations. The system is driven by the interaction between cognition and emotion.

+

自学习作为一种机器学习范式，于1982年提出，并提出了一种具有自学习能力的神经网络叫做'''交叉自适应矩阵 Crossbar Adaptive Array，CAA'''。这是一种没有外部激励和学习器建议的学习方法。CAA自学习算法以交叉方式计算关于行为的决策和关于后果情况的情绪（感觉）。这个系统是由认知和情感的相互作用所驱动的。

+

The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine:

+

The self-learning algorithm updates a memory matrix W =||w(a,s)|| such that in each iteration executes the following machine learning routine:

+

+

In situation s perform action a;

+

In situation s perform action a;

+

在情境中执行动作 a;

+

Receive consequence situation s’;

+

Receive consequence situation s’;

+

接受结果状态 s’ ;

+

Compute emotion of being in consequence situation v(s’);

+

Compute emotion of being in consequence situation v(s’);

+

计算处于结果情境 v (s’)中的情绪;

+

Update crossbar memory w’(a,s) = w(a,s) + v(s’).

+

Update crossbar memory w’(a,s) = w(a,s) + v(s’).

+

更新交叉条记忆存储 w’(a，s) w (a，s) + v (s’)。

+

It is a system with only one input, situation s, and only one output, action (or behavior) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioral environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal seeking behavior, in an environment that contains both desirable and undesirable situations. <ref> Bozinovski, S. (2001) "Self-learning agents: A connectionist theory of emotion based on crossbar value judgment." Cybernetics and Systems 32(6) 637-667. </ref>

+

It is a system with only one input, situation s, and only one output, action (or behavior) a. There is neither a separate reinforcement input nor an advice input from the environment. The backpropagated value (secondary reinforcement) is the emotion toward the consequence situation. The CAA exists in two environments, one is behavioral environment where it behaves, and the other is genetic environment, wherefrom it initially and only once receives initial emotions about situations to be encountered in the behavioral environment. After receiving the genome (species) vector from the genetic environment, the CAA learns a goal seeking behavior, in an environment that contains both desirable and undesirable situations.

+

它是一个只有一个输入、情景和一个输出、动作(或行为)的系统。既没有单独的强化输入，也没有来自环境的通知输入。反向传播价值（二次强化）是对结果情境的情感信息。CAA 存在于两种环境中，一种是行为环境，另一种是遗传环境，CAA将从这样的环境中获取且仅获取到一次关于它自身的初始情绪（这种情绪信息描述了算法应该对这样环境下对应的结果持有何种态度）。在从遗传环境中获得基因组(物种)载体后，CAA 会在一个既包含理想情况又包含不理想情况的环境中学习一种寻求目标的行为。

+

==== 特征学习 Feature learning ====

+

Several learning algorithms aim at discovering better representations of the inputs provided during training.<ref name="pami">{{cite journal |author1=Y. Bengio |author2=A. Courville |author3=P. Vincent |title=Representation Learning: A Review and New Perspectives |journal= IEEE Transactions on Pattern Analysis and Machine Intelligence|year=2013|doi=10.1109/tpami.2013.50 |pmid=23787338 |volume=35 |issue=8 |pages=1798–1828|arxiv=1206.5538 }}</ref> Classic examples include [[principal components analysis]] and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual [[feature engineering]], and allows a machine to both learn the features and use them to perform a specific task.

+

Several learning algorithms aim at discovering better representations of the inputs provided during training. Classic examples include principal components analysis and cluster analysis. Feature learning algorithms, also called representation learning algorithms, often attempt to preserve the information in their input but also transform it in a way that makes it useful, often as a pre-processing step before performing classification or predictions. This technique allows reconstruction of the inputs coming from the unknown data-generating distribution, while not being necessarily faithful to configurations that are implausible under that distribution. This replaces manual feature engineering, and allows a machine to both learn the features and use them to perform a specific task.

+

一些学习算法旨在发现更好的训练数据输入的对应表示，其典型的例子包括'''主成分分析 Principal Components Analysis'''和'''聚类分析 Cluster Analysis'''。'''特征学习 Feature Learning'''算法，也称为'''表征学习 Representation Learning'''算法，通常试图保留输入中的信息，但也可以使用有效的方式对输入进行转换从而达到提升学习效率和效果的目的，通常作为执行分类或预测行为之前的预处理步骤。这种技术可以重构来自未知数据分布生成的输入，但不一定忠实于在这种分布下不可信的配置。这取代了手工特性工程，并且允许机器学习特性并使用它们来执行特定的任务。

+

Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labeled input data. Examples include [[artificial neural network]]s, [[multilayer perceptron]]s, and supervised [[dictionary learning]]. In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, [[independent component analysis]], [[autoencoder]]s, [[matrix decomposition|matrix factorization]]<ref>{{cite conference |author1=Nathan Srebro |author2=Jason D. M. Rennie |author3=Tommi S. Jaakkola |title=Maximum-Margin Matrix Factorization |conference=[[Conference on Neural Information Processing Systems|NIPS]] |year=2004}}</ref> and various forms of [[Cluster analysis|clustering]].<ref name="coates2011">{{cite conference

+

Feature learning can be either supervised or unsupervised. In supervised feature learning, features are learned using labeled input data. Examples include artificial neural networks, multilayer perceptrons, and supervised dictionary learning. In unsupervised feature learning, features are learned with unlabeled input data. Examples include dictionary learning, independent component analysis, autoencoders, matrix factorization and various forms of clustering.<ref name="coates2011">{{cite conference

+

|last1 = Coates

+

|last1 = Coates

+

1 Coates

+

|first1 = Adam

+

|first1 = Adam

+

首先，亚当

+

|last2 = Lee

+

|last2 = Lee

+

最后2名 Lee

+

|first2 = Honglak

+

|first2 = Honglak

+

| first2 Honglak

+

|last3 = Ng

+

|last3 = Ng

+

| 最后3 Ng

+

|first3 = Andrew Y.

+

|first3 = Andrew Y.

+

第三名: 安德鲁 · y。

+

|title = An analysis of single-layer networks in unsupervised feature learning

+

|title = An analysis of single-layer networks in unsupervised feature learning

+

无监督特征学习中的单层网络分析

+

|conference = Int'l Conf. on AI and Statistics (AISTATS)

+

|conference = Int'l Conf. on AI and Statistics (AISTATS)

+

国际会议。有关人工智能及统计的资料

+

|year = 2011

+

|year = 2011

+

2011年

+

|url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

+

|url = http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

+

Http://machinelearning.wustl.edu/mlpapers/paper_files/aistats2011_coatesnl11.pdf

+

|access-date = 2018-11-25

+

|access-date = 2018-11-25

+

2018-11-25

+

|archive-url = https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

+

|archive-url = https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2011_CoatesNL11.pdf

+

| 档案-网址 https://web.archive.org/web/20170813153615/http://machinelearning.wustl.edu/mlpapers/paper_files/aistats2011_coatesnl11.pdf

+

|archive-date = 2017-08-13

+

|archive-date = 2017-08-13

+

| 档案-日期2017-08-13

+

|url-status = dead

+

|url-status = dead

+

状态死机

+

}}</ref><ref>{{cite conference |last1 = Csurka |first1 = Gabriella|last2 = Dance |first2 = Christopher C.|last3 = Fan |first3 = Lixin|last4 = Willamowski |first4 = Jutta|last5 = Bray |first5 = Cédric|title = Visual categorization with bags of keypoints|conference = ECCV Workshop on Statistical Learning in Computer Vision|year = 2004|url = https://www.cs.cmu.edu/~efros/courses/LBMV07/Papers/csurka-eccv-04.pdf}}</ref><ref name="jurafsky">{{cite book |title=Speech and Language Processing |author1=Daniel Jurafsky |author2=James H. Martin |publisher=Pearson Education International |year=2009 |pages=145–146}}</ref>

+

}}</ref>

+

{} / ref

+

特征学习可以是有监督的，也可以是无监督的。在有监督的特征学习中，可以利用标记输入数据学习特征。例如'''人工神经网络 Artificial Neural Networks，ANN'''、'''多层感知机 Multilayer Perceptrons，MLP'''和受控字典式学习模型 Supervised Dictionary Learning Model，SDLM。在无监督的特征学习中，特征是通过未标记的输入数据进行学习的。例如，'''字典学习 Dictionary learning'''、'''独立元素分析 Independent Component Analysis'''、'''自动编码器 Autoencoders'''、'''矩阵分解 Matrix Factorization'''和各种形式的聚类。

+

[[Manifold learning]] algorithms attempt to do so under the constraint that the learned representation is low-dimensional. [[Sparse coding]] algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. [[Multilinear subspace learning]] algorithms aim to learn low-dimensional representations directly from [[tensor]] representations for multidimensional data, without reshaping them into higher-dimensional vectors.<ref>{{cite journal |first1=Haiping |last1=Lu |first2=K.N. |last2=Plataniotis |first3=A.N. |last3=Venetsanopoulos |url=http://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf |title=A Survey of Multilinear Subspace Learning for Tensor Data |journal=Pattern Recognition |volume=44 |number=7 |pages=1540–1551 |year=2011 |doi=10.1016/j.patcog.2011.01.004}}</ref> [[Deep learning]] algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.<ref>{{cite book | title = Learning Deep Architectures for AI | author = Yoshua Bengio | publisher = Now Publishers Inc. | year = 2009 | isbn = 978-1-60198-294-0 | pages = 1–3 | url = https://books.google.com/books?id=cq5ewg7FniMC&pg=PA3| author-link = Yoshua Bengio }}</ref>

+

Manifold learning algorithms attempt to do so under the constraint that the learned representation is low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned representation is sparse, meaning that the mathematical model has many zeros. Multilinear subspace learning algorithms aim to learn low-dimensional representations directly from tensor representations for multidimensional data, without reshaping them into higher-dimensional vectors. Deep learning algorithms discover multiple levels of representation, or a hierarchy of features, with higher-level, more abstract features defined in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one that learns a representation that disentangles the underlying factors of variation that explain the observed data.

+

'''流形学习 Manifold Learning'''算法试图在学习表示为低维的约束条件下进行流形学习。'''稀疏编码算法 Sparse Coding Algorithms'''试图在学习表示为稀疏的约束条件下进行编码，这意味着数学模型有许多'''零点 Zeros'''。'''多线性子空间学习算法 Multilinear Subspace Learning Algorithms'''旨在直接从多维数据的张量表示中学习低维的表示，而不是将它们重塑为高维向量。'''深度学习算法 Deep Learning Algorithms'''发现了多层次的表示，或者是一个特征层次结构，具有更高层次、更抽象的特征，这些特征定义为（或可以生成）低层次的特征。有人认为，一个智能机器的表现是可以学习到一种表示的方法，并能够解释数据观测值变化背后的机理或潜在影响。

+

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.

+

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensory data has not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.

+

特征学习的动力来自于机器学习任务，如分类中，通常需要数学上和计算上方便处理的输入。然而，真实世界的数据，如图像、视频和感官数据，并没有那么简单就可以用通过算法定义特定特征。另一种方法是通过检查发现这些特征或表示，而不依赖于显式算法。

+

==== 稀疏字典学习 Sparse dictionary learning ====

+

Sparse dictionary learning is a feature learning method where a training example is represented as a linear combination of [[basis function]]s, and is assumed to be a [[sparse matrix]]. The method is [[strongly NP-hard]] and difficult to solve approximately.<ref>{{cite journal |first=A. M. |last=Tillmann |title=On the Computational Intractability of Exact and Approximate Dictionary Learning |journal=IEEE Signal Processing Letters |volume=22 |issue=1 |year=2015 |pages=45–49 |doi=10.1109/LSP.2014.2345761|bibcode=2015ISPL...22...45T |arxiv=1405.6664 }}</ref> A popular [[heuristic]] method for sparse dictionary learning is the [[K-SVD]] algorithm. Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine the class to which a previously unseen training example belongs. For a dictionary where each class has already been built, a new training example is associated with the class that is best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in [[image de-noising]]. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.<ref>Aharon, M, M Elad, and A Bruckstein. 2006. "[http://sites.fas.harvard.edu/~cs278/papers/ksvd.pdf K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation]." Signal Processing, IEEE Transactions on 54 (11): 4311–4322</ref>

+

Sparse dictionary learning is a feature learning method where a training example is represented as a linear combination of basis functions, and is assumed to be a sparse matrix. The method is strongly NP-hard and difficult to solve approximately. A popular heuristic method for sparse dictionary learning is the K-SVD algorithm. Sparse dictionary learning has been applied in several contexts. In classification, the problem is to determine the class to which a previously unseen training example belongs. For a dictionary where each class has already been built, a new training example is associated with the class that is best sparsely represented by the corresponding dictionary. Sparse dictionary learning has also been applied in image de-noising. The key idea is that a clean image patch can be sparsely represented by an image dictionary, but the noise cannot.

+

稀疏词典学习是一种特征学习方法，在这种方法中，一个训练样本被表示为基函数的线性组合，并假设为稀疏矩阵。该方法具有强 NP- Hard性并且近似求解困难。一种流行的'''启发式 Heuristic'''稀疏字典学习方法是 K-SVD 算法。稀疏词典学习已经应用于以下几种情况下：在分类中，问题在于如何确定先前未见的训练样本所属的类；对于已经构建了每个类的字典，一个新的训练示例将与相应的字典最好地稀疏表示的类相关联。稀疏字典学习也被广泛应用到图像去噪的问题中。其关键思想是，一个干净的图像'''补丁 patch'''可以由图像字典稀疏地表示，但噪声不能。

+

==== 异常检测 Anomaly detection ====

+

In [[data mining]], anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.<ref name=":0">{{Citation|last=Zimek|first=Arthur|title=Outlier Detection|date=2017|encyclopedia=Encyclopedia of Database Systems|pages=1–5|publisher=Springer New York|language=en|doi=10.1007/978-1-4899-7993-3_80719-1|isbn=9781489979933|last2=Schubert|first2=Erich}}</ref> Typically, the anomalous items represent an issue such as [[bank fraud]], a structural defect, medical problems or errors in a text. Anomalies are referred to as [[outlier]]s, novelties, noise, deviations and exceptions.<ref>{{cite journal | last1 = Hodge | first1 = V. J. | last2 = Austin | first2 = J. | doi = 10.1007/s10462-004-4304-y | title = A Survey of Outlier Detection Methodologies | journal = Artificial Intelligence Review| volume = 22 | issue = 2 | pages = 85–126 | year = 2004 | url = http://eprints.whiterose.ac.uk/767/1/hodgevj4.pdf| pmid = | pmc = | citeseerx = 10.1.1.318.4023 }}</ref>

+

In data mining, anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. Typically, the anomalous items represent an issue such as bank fraud, a structural defect, medical problems or errors in a text. Anomalies are referred to as outliers, novelties, noise, deviations and exceptions.

+

在数据挖掘中，'''异常检测 Anomaly / Outlier detection'''是指识别那些引起怀疑的稀有项目、事件或者观测结果，它们与其他大多数数据有很大的不同。一般来说，这些不正常的项目都可以反映出数据背后的一个问题，如银行欺诈、结构缺陷、医疗问题或文本中的错误。异常也被称为''异常值 Outliers''、''奇异值 Novelties''、''噪音 Noise''、''偏差 Deviations''和''异常 Exceptions''。

+

In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular, unsupervised algorithms) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.<ref>{{cite journal| first=Paul | last=Dokas | first2=Levent |last2=Ertoz |first3=Vipin |last3=Kumar |first4=Aleksandar |last4=Lazarevic |first5=Jaideep |last5=Srivastava |first6=Pang-Ning |last6=Tan | title=Data mining for network intrusion detection | year=2002 | journal=Proceedings NSF Workshop on Next Generation Data Mining | url=http://www.csee.umbc.edu/~kolari1/Mining/ngdm/dokas.pdf}}</ref>

+

In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts in activity. This pattern does not adhere to the common statistical definition of an outlier as a rare object, and many outlier detection methods (in particular, unsupervised algorithms) will fail on such data, unless it has been aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.

+

特别是在滥用和网络入侵检测的背景下，人们感兴趣的往往不是罕见的对象，而是突发性的活动。这种模式并不符合异常值作为稀有对象的通用统计学定义，而且许多异常检测方法（特别是无监督的算法）将无法处理这类数据，除非它已经被适当地聚合处理。相反地，数据聚类算法可以检测到这些模式形成的微团簇。

+

Three broad categories of anomaly detection techniques exist.<ref name="ChandolaSurvey">{{cite journal |last1=Chandola |first1=V. |last2=Banerjee |first2=A. |last3=Kumar |first3=V. |year=2009 |title=Anomaly detection: A survey|journal=[[ACM Computing Surveys]]|volume=41|issue=3|pages=1–58|doi=10.1145/1541880.1541882|url=https://www.semanticscholar.org/paper/71d1ac92ad36b62a04f32ed75a10ad3259a7218d }}</ref> Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set and then test the likelihood of a test instance to be generated by the model.

+

Three broad categories of anomaly detection techniques exist. Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behavior from a given normal training data set and then test the likelihood of a test instance to be generated by the model.

+

异常检测技术有3大类。无监督的异常检测 / 测试技术在假设数据集中大多数实例都是正常的情况下，通过是来寻找数据集中最违和的实例，从而实现检测未被标记的测试数据集中的异常。监督式的异常检测分析技术需要一个被标记为“正常”和“异常”的数据集，还需要训练一个分类器（和许多其他分类分析问题的关键区别在于异常检测本身的不平衡性）。半监督的异常检测技术从给定的正常训练数据集构建一个表示正常行为的模型，然后测试由该模型生成的测试实例的可能性。

+

==== 机器人学习 Robot learning====

+

In [[developmental robotics]], [[robot learning]] algorithms generate their own sequences of learning experiences, also known as a curriculum, to cumulatively acquire new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturation, [[Motor_coordination#Muscle_synergies|motor synergies]] and imitation.

+

In developmental robotics, robot learning algorithms generate their own sequences of learning experiences, also known as a curriculum, to cumulatively acquire new skills through self-guided exploration and social interaction with humans. These robots use guidance mechanisms such as active learning, maturation, motor synergies and imitation.

+

在'''发展型机器人 Developmental robotics'''学习中，机器人学习算法能够产生自己的学习经验序列，也称为课程，通过自我引导的探索来与人类社会进行互动，累积获得新技能。这些机器人在学习的过程中会使用诸如主动学习、成熟、协同运动和模仿等引导机制。

+

==== 关联规则 Association rules ====

+

Association rule learning is a [[rule-based machine learning]] method for discovering relationships between variables in large databases. It is intended to identify strong rules discovered in databases using some measure of "interestingness".<ref name="piatetsky">Piatetsky-Shapiro, Gregory (1991), ''Discovery, analysis, and presentation of strong rules'', in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., ''Knowledge Discovery in Databases'', AAAI/MIT Press, Cambridge, MA.</ref>

+

Association rule learning is a rule-based machine learning method for discovering relationships between variables in large databases. It is intended to identify strong rules discovered in databases using some measure of "interestingness".

+

'''关联规则学习 Association Rule Learning'''是一种'''基于规则的机器学习 Rule-based machine learning'''方法，用于发现大型数据库中变量之间的关系。它旨在利用某种“有趣度”的度量，识别在数据库中发现的强大规则。

+

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining characteristic of a rule-based machine learning algorithm is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learning algorithms that commonly identify a singular model that can be universally applied to any instance in order to make a prediction.<ref>{{Cite journal|last=Bassel|first=George W.|last2=Glaab|first2=Enrico|last3=Marquez|first3=Julietta|last4=Holdsworth|first4=Michael J.|last5=Bacardit|first5=Jaume|date=2011-09-01|title=Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets|journal=The Plant Cell|language=en|volume=23|issue=9|pages=3101–3116|doi=10.1105/tpc.111.088153|issn=1532-298X|pmc=3203449|pmid=21896882}}</ref> Rule-based machine learning approaches include [[learning classifier system]]s, association rule learning, and [[artificial immune system]]s.

+

Rule-based machine learning is a general term for any machine learning method that identifies, learns, or evolves "rules" to store, manipulate or apply knowledge. The defining characteristic of a rule-based machine learning algorithm is the identification and utilization of a set of relational rules that collectively represent the knowledge captured by the system. This is in contrast to other machine learning algorithms that commonly identify a singular model that can be universally applied to any instance in order to make a prediction. Rule-based machine learning approaches include learning classifier systems, association rule learning, and artificial immune systems.

+

基于规则的机器学习是任何机器学习方法的通用术语，这些机器学习方法识别、学习或发展“规则”来存储、操作或应用知识。基于规则的机器学习算法这一定义的特点是识别和利用一组共同表示系统捕获的知识的关系规则。这与其他机器学习算法不同，后者往往只识别一个单一的模型，这个模型可以普遍应用于任何实例，以便进行预测。基于规则的机器学习方法包括'''学习分类器系统 Learning Classifier System'''、关联规则学习和'''人工免疫系统 Artificial Immune System'''。

+

Based on the concept of strong rules, [[Rakesh Agrawal (computer scientist)|Rakesh Agrawal]], [[Tomasz Imieliński]] and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by [[point-of-sale]] (POS) systems in supermarkets.<ref name="mining">{{Cite book | last1 = Agrawal | first1 = R. | last2 = Imieliński | first2 = T. | last3 = Swami | first3 = A. | doi = 10.1145/170035.170072 | chapter = Mining association rules between sets of items in large databases | title = Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93 | pages = 207 | year = 1993 | isbn = 978-0897915922 | pmid = | pmc = | citeseerx = 10.1.1.40.6984 }}</ref> For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as promotional [[pricing]] or [[product placement]]s. In addition to [[market basket analysis]], association rules are employed today in application areas including [[Web usage mining]], [[intrusion detection]], [[continuous production]], and [[bioinformatics]]. In contrast with [[sequence mining]], association rule learning typically does not consider the order of items either within a transaction or across transactions.

+

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami introduced association rules for discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS) systems in supermarkets. For example, the rule <math>\{\mathrm{onions, potatoes}\} \Rightarrow \{\mathrm{burger}\}</math> found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat. Such information can be used as the basis for decisions about marketing activities such as promotional pricing or product placements. In addition to market basket analysis, association rules are employed today in application areas including Web usage mining, intrusion detection, continuous production, and bioinformatics. In contrast with sequence mining, association rule learning typically does not consider the order of items either within a transaction or across transactions.

+

基于强规则的原理，Rakesh Agrawal、 Tomasz imieli ski 和 Arun Swami 引入了关联规则这一概念，用于在超市销售点（POS）系统记录的大规模交易数据中发现产品之间的规则。例如，在超市的销售数据中发现的规则表明，如果某位顾客同时购买洋葱和土豆，那么他也很可能会购买汉堡肉。这些信息可以作为市场决策的依据，如促销价格或产品植入。除了市场篮子分析之外，关联规则还应用于 '''Web 使用挖掘 Web Usage Mining'''、'''入侵检测 Intrusion Detection'''、连续生产和'''生物信息学 Bioinformatics'''等应用领域。与序列挖掘相比，关联规则学习通常不考虑事务内或事务之间的先后顺序。

+

下面是几种常见的基于规则的机器学习算法：

+

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component, typically a [[genetic algorithm]], with a learning component, performing either [[supervised learning]], [[reinforcement learning]], or [[unsupervised learning]]. They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a [[piecewise]] manner in order to make predictions.<ref>{{Cite journal|last=Urbanowicz|first=Ryan J.|last2=Moore|first2=Jason H.|date=2009-09-22|title=Learning Classifier Systems: A Complete Introduction, Review, and Roadmap|journal=Journal of Artificial Evolution and Applications|language=en|volume=2009|pages=1–25|doi=10.1155/2009/736398|issn=1687-6229|doi-access=free}}</ref>

+

Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that combine a discovery component, typically a genetic algorithm, with a learning component, performing either supervised learning, reinforcement learning, or unsupervised learning. They seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions.

+

'''学习分类器系统 Learning Classifier Systems，LCS'''是一系列基于规则的机器学习算法，它结合了一个发现组件，通常是一个遗传算法和一个学习组件，执行监督式学习、强化学习或非监督式学习。他们试图给出一组与上下文相关的规则，而这些规则以分段的方式共同储存和应用知识，以便进行预测。

+

Inductive logic programming (ILP) is an approach to rule-learning using [[logic programming]] as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that [[Entailment|entails]] all positive and no negative examples. [[Inductive programming]] is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as [[Functional programming|functional programs]].

+

Inductive logic programming (ILP) is an approach to rule-learning using logic programming as a uniform representation for input examples, background knowledge, and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesized logic program that entails all positive and no negative examples. Inductive programming is a related field that considers any kind of programming languages for representing hypotheses (and not only logic programming), such as functional programs.

+

'''归纳逻辑规划 Inductive Logic Programming，ILP'''是一种用逻辑规划作为输入示例、背景知识和假设的统一表示的规则学习方法。如果将已知的背景知识进行编码，并将一组示例表示为事实的逻辑数据库，ILP 系统将推导出一个假设的逻辑程序，其中包含所有正面和负面的样例。归纳编程是一个与其相关的领域，它考虑用任何一种编程语言来表示假设（不仅仅是逻辑编程），比如'''函数编程 Functional programs'''。

+

Inductive logic programming is particularly useful in [[bioinformatics]] and [[natural language processing]]. [[Gordon Plotkin]] and [[Ehud Shapiro]] laid the initial theoretical foundation for inductive machine learning in a logical setting.<ref>Plotkin G.D. [https://www.era.lib.ed.ac.uk/bitstream/handle/1842/6656/Plotkin1972.pdf;sequence=1 Automatic Methods of Inductive Inference], PhD thesis, University of Edinburgh, 1970.</ref><ref>Shapiro, Ehud Y. [http://ftp.cs.yale.edu/publications/techreports/tr192.pdf Inductive inference of theories from facts], Research Report 192, Yale University, Department of Computer Science, 1981. Reprinted in J.-L. Lassez, G. Plotkin (Eds.), Computational Logic, The MIT Press, Cambridge, MA, 1991, pp. 199–254.</ref><ref>Shapiro, Ehud Y. (1983). ''Algorithmic program debugging''. Cambridge, Mass: MIT Press. {{ISBN|0-262-19218-7}}</ref> Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples.<ref>Shapiro, Ehud Y. "[http://dl.acm.org/citation.cfm?id=1623364 The model inference system]." Proceedings of the 7th international joint conference on Artificial intelligence-Volume 2. Morgan Kaufmann Publishers Inc., 1981.</ref> The term ''inductive'' here refers to [[Inductive reasoning|philosophical]] induction, suggesting a theory to explain observed facts, rather than [[mathematical induction|mathematical]] induction, proving a property for all members of a well-ordered set.

+

Inductive logic programming is particularly useful in bioinformatics and natural language processing. Gordon Plotkin and Ehud Shapiro laid the initial theoretical foundation for inductive machine learning in a logical setting. Shapiro built their first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples. The term inductive here refers to philosophical induction, suggesting a theory to explain observed facts, rather than mathematical induction, proving a property for all members of a well-ordered set.

+

'''归纳逻辑程序设计 Inductive Logic Programming'''在生物信息学和'''自然语言处理 Natural Language Processing'''中特别有用。戈登 · 普洛特金 Gordon Plotkin和埃胡德 · 夏皮罗 Ehud Shapiro为归纳机器学习在逻辑上奠定了最初的理论基础。夏皮罗 Shapiro在1981年实现了他们的第一个模型推理系统: 一个从正反例中归纳推断逻辑程序的 Prolog 程序。这里的”归纳“指的是哲学上的归纳，通过提出一个理论来解释观察到的事实，而不是数学归纳法证明了一个有序集合的所有成员的性质。

+

=== 模型 Models ===

+

Performing machine learning involves creating a [[Statistical model|model]], which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.

+

Performing machine learning involves creating a model, which is trained on some training data and then can process additional data to make predictions. Various types of models have been used and researched for machine learning systems.

+

执行机器学习需要建立一个算法模型，该模型根据一些训练数据进行训练，然后可以处理额外的数据进行预测。机器学习系统已经使用和研究了各种类型的模型。

+

==== 人工神经网络 Artificial neural networks ====

+

[[File:Colored neural network.svg|thumb|300px|An artificial neural network is an interconnected group of nodes, akin to the vast network of [[neuron]]s in a [[brain]]. Here, each circular node represents an [[artificial neuron]] and an arrow represents a connection from the output of one artificial neuron to the input of another.]]

+

An artificial neural network is an interconnected group of nodes, akin to the vast network of [[neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another.]]

+

'''人工神经网络 Artificial Neural Network，ANN'''是一组相互连接的节点，类似于大脑中庞大的神经元网络。在这里，每个圆形节点代表一个人工'''神经元 Neuron'''，一个箭头代表从一个人工神经元的输出到另一个输入的连接

+

Artificial neural networks (ANNs), or [[Connectionism|connectionist]] systems, are computing systems vaguely inspired by the [[biological neural network]]s that constitute animal [[brain]]s. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.

+

Artificial neural networks (ANNs), or connectionist systems, are computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules.

+

人工神经网络，或'''连接主义系统 Connectionism System'''，是计算机系统受到构成动物大脑的生物神经网络的启发后的研究成果。这种系统通过研究样本来“学习”如何执行任务，通常不需要对任何特定任务的规则进行编程。

+

An ANN is a model based on a collection of connected units or nodes called "[[artificial neuron]]s", which loosely model the [[neuron]]s in a biological [[brain]]. Each connection, like the [[synapse]]s in a biological [[brain]], can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a [[real number]], and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a [[weight (mathematics)|weight]] that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.

+

An ANN is a model based on a collection of connected units or nodes called "artificial neurons", which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit information, a "signal", from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. In common ANN implementations, the signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is computed by some non-linear function of the sum of its inputs. The connections between artificial neurons are called "edges". Artificial neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons may have a threshold such that the signal is only sent if the aggregate signal crosses that threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first layer (the input layer) to the last layer (the output layer), possibly after traversing the layers multiple times.

+

人工神经网络是一种基于一组被称为“人工神经元”的连接单元或节点的模型，人工神经元可以对生物大脑中的神经元进行松散的建模。每一个连接，就像生物大脑中的突触一样，可以将信息，一个“信号” ，从一个人工神经元传递到另一个。接收到信号的人工神经元可以处理它，然后发送信号给连接到它的其他人工神经元。在通常的人工神经网络实现中，人工神经元之间连接处的信号是一个实数，每个人工神经元的输出是由一些输入和的非线性函数计算出来的。人造神经元之间的连接称为“边缘”。人工神经元和边缘通常有一个权重，可以随着学习的进行而调整。重量增加或减少连接处信号的强度。人工神经元可能有一个阈值，这样只有当聚合信号超过这个阈值时才发送信号。通常，人造神经元聚集成层。不同的层可以对其输入执行不同类型的转换。信号从第一层(输入层)传输到最后一层(输出层) ，可能是在多次遍历这些层之后。

+

The original goal of the ANN approach was to solve problems in the same way that a [[human brain]] would. However, over time, attention moved to performing specific tasks, leading to deviations from [[biology]]. Artificial neural networks have been used on a variety of tasks, including [[computer vision]], [[speech recognition]], [[machine translation]], [[social network]] filtering, [[general game playing|playing board and video games]] and [[medical diagnosis]].

+

The original goal of the ANN approach was to solve problems in the same way that a human brain would. However, over time, attention moved to performing specific tasks, leading to deviations from biology. Artificial neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

+

人工神经网络方法的最初目标是用人类大脑解决问题的同样方式。然而，随着时间的推移，其注意力转移到执行特定的任务上，导致了与生物学的偏差。现在人工神经网络已被用于各种任务中，包括'''计算机视觉 Computer Visio'''、'''语音识别 Speech Recognition'''、'''机器翻译 Machine Translation'''、'''社会网络过滤 Social Network Filtering'''、玩棋盘和视频游戏以及医疗诊断。

+

[[Deep learning]] consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are [[computer vision]] and [[speech recognition]].<ref>Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.802&rep=rep1&type=pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations]" Proceedings of the 26th Annual International Conference on Machine Learning, 2009.</ref>

+

Deep learning consists of multiple hidden layers in an artificial neural network. This approach tries to model the way the human brain processes light and sound into vision and hearing. Some successful applications of deep learning are computer vision and speech recognition.

+

深度学习由人工神经网络中的多个隐层组成，通过这种方法可以尽量模拟人类大脑将光和声音处理成视觉和听觉的方式。深度学习的一些成功应用是计算机视觉和语音识别。

+

==== 决策树 Decision trees ====

+

Decision tree learning uses a [[decision tree]] as a [[Predictive modelling|predictive model]] to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, [[leaf node|leaves]] represent class labels and branches represent [[Logical conjunction|conjunction]]s of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically [[real numbers]]) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and [[decision making]]. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making.

+

Decision tree learning uses a decision tree as a predictive model to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). It is one of the predictive modeling approaches used in statistics, data mining and machine learning. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data, but the resulting classification tree can be an input for decision making.

+

'''决策树 Decision Tree'''学习是使用一个决策树作为一个预测模型，从对一个项目的观察（在分支中表示）到对该项目的目标值的结论（在叶子结点中表示）。它是统计学、数据挖掘和机器学习中常用的预测建模方法之一。目标变量接受到的一组离散值的树模型称为分类树; 在这些树结构中，叶子代表类标签，分支代表连接到这些类标签的特征。其中目标变量可以取连续值（通常是实数）的决策树称为回归树。在决策分析中，可以使用决策树直观地表示决策和决策。在数据挖掘中，决策树是用来描述数据的，但得到的分类树可以作为决策的输入。

+

==== 支持向量机 Support vector machines ====

+

Support vector machines (SVMs), also known as support vector networks, are a set of related [[supervised learning]] methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.<ref name="CorinnaCortes">{{Cite journal |last1=Cortes |first1=Corinna |authorlink1=Corinna Cortes |last2=Vapnik |first2=Vladimir N. |year=1995 |title=Support-vector networks |journal=[[Machine Learning (journal)|Machine Learning]] |volume=20 |issue=3 |pages=273–297 |doi=10.1007/BF00994018 |doi-access=free }}</ref> An SVM training algorithm is a non-[[probabilistic classification|probabilistic]], [[binary classifier|binary]], [[linear classifier]], although methods such as [[Platt scaling]] exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the [[kernel trick]], implicitly mapping their inputs into high-dimensional feature spaces.

+

Support vector machines (SVMs), also known as support vector networks, are a set of related supervised learning methods used for classification and regression. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

+

'''支持向量机 Support vector machines，SVMs'''，也称为支持向量网络，是一系列用于分类和回归的相关监督式学习方法。给定一组训练样本，每个样本标记为两个类别中的一个，SVM 训练算法通过建立一个模型来预测一个新样本是两个类别中的哪一个。支持向量机的训练算法用到的是一种非概率的二进制线性分类器，尽管在概率分类环境中也存在使用支持向量机的方法，如 Platt 缩放法。除了执行线性分类，支持向量机可以有效地执行非线性分类使用所谓的'''核技巧 Kernel trick'''，隐式地将模型输入映射到高维特征空间。

+

[[Image:Linear regression.svg|thumb|upright=1.3|Illustration of linear regression on a data set.]]

+

Illustration of linear regression on a data set.

+

数据集上的线性回归。

+

==== 回归分析 Regression analysis ====

+

Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is [[linear regression]], where a single line is drawn to best fit the given data according to a mathematical criterion such as [[ordinary least squares]]. The latter is often extended by [[regularization (mathematics)]] methods to mitigate overfitting and bias, as in [[ridge regression]]. When dealing with non-linear problems, go-to models include [[polynomial regression]] (for example, used for trendline fitting in Microsoft Excel <ref>{{cite web|last1=Stevenson|first1=Christopher|title=Tutorial: Polynomial Regression in Excel|url=https://facultystaff.richmond.edu/~cstevens/301/Excel4.html|website=facultystaff.richmond.edu|accessdate=22 January 2017}}</ref>), [[Logistic regression]] (often used in [[statistical classification]]) or even [[kernel regression]], which introduces non-linearity by taking advantage of the [[kernel trick]] to implicitly map input variables to higher dimensional space.

+

Regression analysis encompasses a large variety of statistical methods to estimate the relationship between input variables and their associated features. Its most common form is linear regression, where a single line is drawn to best fit the given data according to a mathematical criterion such as ordinary least squares. The latter is often extended by regularization (mathematics) methods to mitigate overfitting and bias, as in ridge regression. When dealing with non-linear problems, go-to models include polynomial regression (for example, used for trendline fitting in Microsoft Excel ), Logistic regression (often used in statistical classification) or even kernel regression, which introduces non-linearity by taking advantage of the kernel trick to implicitly map input variables to higher dimensional space.

+

'''回归分析 Regression Analysis'''包含了大量的统计方法来估计输入变量和它们的相关特征之间的关系。它最常见的形式是'''线性回归 Linear Regression'''，根据一个数学标准，比如一般最小平方法，画一条线来最好地拟合给定的数据。后者通常通过正则化(数学)方法来扩展，以减少过拟合和偏差，如岭回归。在处理非线性问题时，常用的模型包括多项式回归(例如，在 Microsoft Excel 中用于趋势线拟合)、 Logit模型回归（通常用于分类）甚至核回归，它利用核技巧将输入变量隐式地映射到更高维度空间，从而引入了非线性。

+

==== 贝叶斯网络 Bayesian networks ====

+

[[Image:SimpleBayesNetNodes.svg|thumb|right|A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet. 一个简单的贝叶斯网路。雨水会影响喷头是否被激活，雨水和喷头都会影响草地是否湿润。]]

+

A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain and the sprinkler influence whether the grass is wet.

+

一个简单的贝叶斯网路。雨水会影响喷头是否被激活，雨水和喷头都会影响草地是否湿润。

+

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic [[graphical model]] that represents a set of [[random variables]] and their [[conditional independence]] with a [[directed acyclic graph]] (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform [[inference]] and learning. Bayesian networks that model sequences of variables, like [[speech recognition|speech signals]] or [[peptide sequence|protein sequences]], are called [[dynamic Bayesian network]]s. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called [[influence diagram]]s.

+

A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independence with a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning. Bayesian networks that model sequences of variables, like speech signals or protein sequences, are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

+

一个'''贝叶斯网路 Bayesian Network'''、'''信念网络 Belief Network'''或'''有向无环图 Directed Acyclic Graph，DAG'''模型是一个概率图模型，代表一组随机变量及其条件独立与有向无环图。例如，贝叶斯网路可以表示疾病和症状之间的概率关系。在给定症状的情况下，该网络可用于计算各种疾病出现的概率。现有的高效算法可以执行推理和学习。贝叶斯网络模型的变量序列，如语音信号或蛋白质序列，被称为动态贝叶斯网络。而贝叶斯网络能够表示和解决不确定性决策问题的推广称为影响图。

+

==== 遗传算法 Genetic algorithms ====

+

A genetic algorithm (GA) is a [[search algorithm]] and [[heuristic (computer science)|heuristic]] technique that mimics the process of [[natural selection]], using methods such as [[Mutation (genetic algorithm)|mutation]] and [[Crossover (genetic algorithm)|crossover]] to generate new [[Chromosome (genetic algorithm)|genotype]]s in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s.<ref>{{cite journal |last1=Goldberg |first1=David E. |first2=John H. |last2=Holland |title=Genetic algorithms and machine learning |journal=[[Machine Learning (journal)|Machine Learning]] |volume=3 |issue=2 |year=1988 |pages=95–99 |doi=10.1007/bf00113892|url=https://deepblue.lib.umich.edu/bitstream/2027.42/46947/1/10994_2005_Article_422926.pdf }}</ref><ref>{{Cite journal |title=Machine Learning, Neural and Statistical Classification |journal=Ellis Horwood Series in Artificial Intelligence |first1=D. |last1=Michie |first2=D. J. |last2=Spiegelhalter |first3=C. C. |last3=Taylor |year=1994 |bibcode=1994mlns.book.....M }}</ref> Conversely, machine learning techniques have been used to improve the performance of genetic and [[evolutionary algorithm]]s.<ref>{{cite journal |last1=Zhang |first1=Jun |last2=Zhan |first2=Zhi-hui |last3=Lin |first3=Ying |last4=Chen |first4=Ni |last5=Gong |first5=Yue-jiao |last6=Zhong |first6=Jing-hui |last7=Chung |first7=Henry S.H. |last8=Li |first8=Yun |last9=Shi |first9=Yu-hui |title=Evolutionary Computation Meets Machine Learning: A Survey |journal=Computational Intelligence Magazine |year=2011 |volume=6 |issue=4 |pages=68–75 |doi=10.1109/mci.2011.942584}}</ref>

+

A genetic algorithm (GA) is a search algorithm and heuristic technique that mimics the process of natural selection, using methods such as mutation and crossover to generate new genotypes in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms were used in the 1980s and 1990s. Conversely, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.

+

'''遗传算法 Genetic Algorithm，GA'''是一种模仿自然选择过程的搜索算法和启发式技术，利用变异和交叉等方法产生新的基因型，以期为给定的问题找到最优解。在机器学习中，遗传算法在20世纪80年代和90年代被广泛使用，而现在的机器学习技术已经可以被用来改善遗传和进化算法的性能。

+

=== 训练模型 Training models ===

+

Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. [[Overfitting]] is something to watch out for when training a machine learning model.

+

Usually, machine learning models require a lot of data in order for them to perform well. Usually, when training a machine learning model, one needs to collect a large, representative sample of data from a training set. Data from the training set can be as varied as a corpus of text, a collection of images, and data collected from individual users of a service. Overfitting is something to watch out for when training a machine learning model.

+

通常情况下，机器学习模型需要大量的数据才能有良好的性能，因此当训练一个机器学习模型时，需要从一个训练集中收集大量有代表性的数据样本。来自训练集的数据可以像文本语料库、图像集合和从服务的单个用户收集的数据一样多种多样。当训练一个机器学习模型时，需要特别注意过拟合问题。

+

==== 联合学习 Federated learning ====

+

Federated learning is a new approach to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, [[Gboard]] uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to [[Google]].<ref>{{Cite web|url=http://ai.googleblog.com/2017/04/federated-learning-collaborative.html|title=Federated Learning: Collaborative Machine Learning without Centralized Training Data|website=Google AI Blog|language=en|access-date=2019-06-08}}</ref>

+

Federated learning is a new approach to training machine learning models that decentralizes the training process, allowing for users' privacy to be maintained by not needing to send their data to a centralized server. This also increases efficiency by decentralizing the training process to many devices. For example, Gboard uses federated machine learning to train search query prediction models on users' mobile phones without having to send individual searches back to Google.

+

'''联合学习 Federated Learning'''是一种新的训练机器学习模型的方法，它分散了训练的过程，允许用户不需要将他们的数据发送到一个集中的服务器这样的做法来维护他们的隐私。通过将模型的训练过程分散到许多设备上，提升了算法效率。例如，谷歌董事会使用联合机器学习刚发来训练用户手机上的搜索查询预测模型，而不必将每个人地搜索信息发送回谷歌。

==应用==

Yillia Jing

463

个编辑