# 模式识别

此词条暂由Henry翻译。 本词条暂由Miyasaki审校。

**Pattern recognition** is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. However, these activities can be viewed as two facets of the same field of application, and together they have undergone substantial development over the past few decades. A modern definition of pattern recognition is:

Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. However, these activities can be viewed as two facets of the same field of application, and together they have undergone substantial development over the past few decades. A modern definition of pattern recognition is:

模式识别Pattern recognition是对数据中的模式和规律的自动识别。它在统计数据分析、信号处理、图像分析、信息检索、生物信息学、数据压缩、计算机图形学和机器学习等领域都有应用。模式识别起源于统计学和工程学; 一些现代的模式识别方法包括使用机器学习，这是由于大数据的可获得性增加和处理能力的日益丰富。然而，这些活动可以看作是同一应用领域的两个方面，在过去几十年中，它们一起经历了实质性的发展。模式识别的一个现代定义是:

{{quote

{{quote

{引用

|The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.^{[1]}}}

|The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.}}

模式识别领域关注的是通过使用计算机算法自动发现数据中的规律性，并利用这些规律性来进行诸如将数据分类等行动^{[2]}。

This article focuses on machine learning approaches to pattern recognition. Pattern recognition systems are in many cases trained from labeled "training" data (supervised learning), but when no labeled data are available other algorithms can be used to discover previously unknown patterns (unsupervised learning). Machine learning is strongly related to pattern recognition and originates from artificial intelligence. KDD and data mining have a larger focus on unsupervised methods and stronger connection to business use. Pattern recognition focuses more on the signal and also takes acquisition and Signal Processing into consideration. It originated in engineering, and the term is popular in the context of computer vision: a leading computer vision conference is named Conference on Computer Vision and Pattern Recognition. In pattern recognition, there may be a higher interest to formalize, explain and visualize the pattern, while machine learning traditionally focuses on maximizing the recognition rates. Yet, all of these domains have evolved substantially from their roots in artificial intelligence, engineering and statistics, and they've become increasingly similar by integrating developments and ideas from each other.

This article focuses on machine learning approaches to pattern recognition. Pattern recognition systems are in many cases trained from labeled "training" data (supervised learning), but when no labeled data are available other algorithms can be used to discover previously unknown patterns (unsupervised learning). Machine learning is strongly related to pattern recognition and originates from artificial intelligence. KDD and data mining have a larger focus on unsupervised methods and stronger connection to business use. Pattern recognition focuses more on the signal and also takes acquisition and Signal Processing into consideration. It originated in engineering, and the term is popular in the context of computer vision: a leading computer vision conference is named Conference on Computer Vision and Pattern Recognition. In pattern recognition, there may be a higher interest to formalize, explain and visualize the pattern, while machine learning traditionally focuses on maximizing the recognition rates. Yet, all of these domains have evolved substantially from their roots in artificial intelligence, engineering and statistics, and they've become increasingly similar by integrating developments and ideas from each other.

本文主要讨论模式识别中的机器学习方法。在许多情况下，模式识别系统是从标记为“训练”的数据(监督式学习)中训练出来的，但是当没有标记的数据可用时，可以用其他算法来发现以前未知的模式(非监督式学习)。机器学习起源于人工智能，与模式识别密切相关。KDD和数据挖掘更注重无监督的方法和与业务使用有更强的联系。模式识别更多地关注信号，同时也考虑了采集和信号处理。它起源于工程学，并在计算机视觉的背景中很是流行: 一个领先的计算机视觉会议被命名为计算机视觉和模式识别会议。在模式识别中，我们对模式的形式化、解释和可视化可能有更高的兴趣，而机器学习传统上侧重于最大化识别率。然而，所有这些领域都从人工智能、工程学和统计学中的根源发展而来，通过整合彼此的发展和想法，它们变得越来越相似。

In machine learning, pattern recognition is the assignment of a label to a given input value. In statistics, discriminant analysis was introduced for this same purpose in 1936. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of *classes* (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input;^{[3]} sequence labeling, which assigns a class to each member of a sequence of values ^{[4]}(for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence.^{[5]}

In machine learning, pattern recognition is the assignment of a label to a given input value. In statistics, discriminant analysis was introduced for this same purpose in 1936. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence.

在机器学习中，模式识别是对给定的输入值赋予一个标记。在统计学方面，1936年为了同样的目的引入了判别分析。模式识别的一个例子是分类，它试图将每个输入值分配给给定的一组类中的一个(例如，确定给定的电子邮件是“垃圾邮件”还是“非垃圾邮件”)。然而，模式识别是一个更普遍的问题，它也包括其他类型的输出。其他例子包括回归，分配给每个输入的实值输出^{[6]}; 序列标注，给序列值的每个成员分配一个类^{[7]}(例如，词性标注，给输入句子中的每个词分配一个词性);解析，给输入句子分配一个解析树，描述句子的句法结构^{[8]}。

Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation. This is opposed to *pattern matching* algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors. In contrast to pattern recognition, pattern matching is not generally a type of machine learning, although pattern-matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similar-quality output of the sort provided by pattern-recognition algorithms.

Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation. This is opposed to pattern matching algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors. In contrast to pattern recognition, pattern matching is not generally a type of machine learning, although pattern-matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similar-quality output of the sort provided by pattern-recognition algorithms.

模式识别算法通常旨在为所有可能的输入提供一个合理的答案，并对输入进行”最有可能的”匹配，同时考虑它们的统计变化。这与模式匹配算法相反，后者在输入中寻找与预先存在的模式的精确匹配。模式匹配算法的一个常见例子是正则表达式匹配，它在文本数据中查找给定排序的模式，并将其包含在许多文本编辑器和文字处理器的搜索功能中。与模式识别相比，模式匹配学习通常不是一种机器学习，尽管模式匹配算法(特别是相当一般性的、仔细地定制的模式)有时能够成功地提供模式识别算法所提供的类似质量的输出。

## Overview

## Overview

概览

Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. *Supervised learning* assumes that a set of *training data* (the *training set*) has been provided, consisting of a set of instances that have been properly labeled by hand with the correct output. A learning procedure then generates a *model* that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor, discussed below). Unsupervised learning, on the other hand, assumes training data that has not been hand-labeled, and attempts to find inherent patterns in the data that can then be used to determine the correct output value for new data instances.^{[9]} A combination of the two that has recently been explored is semi-supervised learning, which uses a combination of labeled and unlabeled data (typically a small set of labeled data combined with a large amount of unlabeled data). Note that in cases of unsupervised learning, there may be no training data at all to speak of; in other words,and the data to be labeled *is* the training data.

Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. Supervised learning assumes that a set of training data (the training set) has been provided, consisting of a set of instances that have been properly labeled by hand with the correct output. A learning procedure then generates a model that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor, discussed below). Unsupervised learning, on the other hand, assumes training data that has not been hand-labeled, and attempts to find inherent patterns in the data that can then be used to determine the correct output value for new data instances. A combination of the two that has recently been explored is semi-supervised learning, which uses a combination of labeled and unlabeled data (typically a small set of labeled data combined with a large amount of unlabeled data). Note that in cases of unsupervised learning, there may be no training data at all to speak of; in other words,and the data to be labeled is the training data.

模式识别一般按照产生输出值的学习过程类型进行分类。监督式学习假设已经提供了一组训练数据(训练集) ，包括一组用正确的输出正确地手工标记的实例。然后，一个学习过程生成一个模型，该模型试图满足两个有时相互冲突的目标: 在培训数据上尽可能好地表现，并尽可能地将其概括为新数据(通常，这意味着尽可能简单，对于某些“简单”的技术定义，按照奥卡姆剃刀定理，下文将另作讨论)。另一方面，非监督式学习则假设没有手工标记的训练数据，并试图找到数据中的固有模式，然后用这些模式来确定新数据实例的正确输出值。^{[10]}最近探索的两个组合是半监督学习数据库，它使用有标记和无标记数据的组合(通常是一小组有标记数据和大量无标记数据的组合)。请注意，在非监督式学习的情况下，可能根本没有培训数据可言; 换句话说，要标记的数据就是培训数据。

Note that sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output. For example, the unsupervised equivalent of classification is normally known as *clustering*, based on the common perception of the task as involving no training data to speak of, and of grouping the input data into *clusters* based on some inherent similarity measure (e.g. the distance between instances, considered as vectors in a multi-dimensional vector space), rather than assigning each input instance into one of a set of pre-defined classes. In some fields, the terminology is different: For example, in community ecology, the term "classification" is used to refer to what is commonly known as "clustering".

Note that sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output. For example, the unsupervised equivalent of classification is normally known as clustering, based on the common perception of the task as involving no training data to speak of, and of grouping the input data into clusters based on some inherent similarity measure (e.g. the distance between instances, considered as vectors in a multi-dimensional vector space), rather than assigning each input instance into one of a set of pre-defined classes. In some fields, the terminology is different: For example, in community ecology, the term "classification" is used to refer to what is commonly known as "clustering".

请注意，对于同一类型的输出，有时会使用不同的术语来描述相应的监督和非监督式学习程序。例如，分类的非监督等价物通常被称为聚类，基于对任务的共同认识，即不涉及可以说是训练数据，以及基于一些固有的相似性度量(例如:事例之间的距离，在多维向量空间中被认为是向量) ，而不是将每个输入实例分配到一组预定义的类中。在某些领域，术语是不同的: 例如，在群落生态学，术语“分类”被用来指称通常所说的“聚类”。

The piece of input data for which an output value is generated is formally termed an *instance*. The instance is formally described by a vector of *features*, which together constitute a description of all known characteristics of the instance. (These feature vectors can be seen as defining points in an appropriate multidimensional space, and methods for manipulating vectors in vector spaces can be correspondingly applied to them, such as computing the dot product or the angle between two vectors.) Typically, features are either categorical (also known as nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"), integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or real-valued (e.g., a measurement of blood pressure). Often, categorical and ordinal data are grouped together; likewise for integer-valued and real-valued data. Furthermore, many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be *discretized* into groups (e.g., less than 5, between 5 and 10, or greater than 10).

The piece of input data for which an output value is generated is formally termed an instance. The instance is formally described by a vector of features, which together constitute a description of all known characteristics of the instance. (These feature vectors can be seen as defining points in an appropriate multidimensional space, and methods for manipulating vectors in vector spaces can be correspondingly applied to them, such as computing the dot product or the angle between two vectors.) Typically, features are either categorical (also known as nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"), integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or real-valued (e.g., a measurement of blood pressure). Often, categorical and ordinal data are grouped together; likewise for integer-valued and real-valued data. Furthermore, many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be discretized into groups (e.g., less than 5, between 5 and 10, or greater than 10).

为其生成输出值的输入数据片段正式称为实例。实例由特性向量形式化地描述，特性向量共同构成对实例的所有已知特性的描述。(这些特征向量可以看作是在适当的多维空间中定义的点，在向量空间中操纵向量的方法可以相应地应用于它们，例如计算两个向量之间的点积或夹角。)通常，特征要么是分类的(也称为名义的，即由一组无序项目中的一个组成，例如性别为“男”或“女” ，要么血型为“A”、“B”、“AB”或“O”)、序数(由一组有序项目中的一个组成，例如“ large”、“ medium”或“ small”)、整数值(例如电子邮件中特定单词出现次数的计数)或实数值(例如血压测量值)。经常将范畴数据和有序数据分组在一起; 对于整数值和实数值数据也是如此。此外，许多算法只对范畴数据进行处理，并要求将实值或整值数据离散成组(例如，小于5，介于5和10之间，或大于10)。

### Probabilistic classifiers

### Probabilistic classifiers

概率分类器

Many common pattern recognition algorithms are *probabilistic* in nature, in that they use statistical inference to find the best label for a given instance. Unlike other algorithms, which simply output a "best" label, often probabilistic algorithms also output a probability of the instance being described by the given label. In addition, many probabilistic algorithms output a list of the *N*-best labels with associated probabilities, for some value of *N*, instead of simply a single best label. When the number of possible labels is fairly small (e.g., in the case of classification), *N* may be set so that the probability of all possible labels is output. Probabilistic algorithms have many advantages over non-probabilistic algorithms:

Many common pattern recognition algorithms are probabilistic in nature, in that they use statistical inference to find the best label for a given instance. Unlike other algorithms, which simply output a "best" label, often probabilistic algorithms also output a probability of the instance being described by the given label. In addition, many probabilistic algorithms output a list of the N-best labels with associated probabilities, for some value of N, instead of simply a single best label. When the number of possible labels is fairly small (e.g., in the case of classification), N may be set so that the probability of all possible labels is output. Probabilistic algorithms have many advantages over non-probabilistic algorithms:

许多常见的模式识别算法本质上都是 *概率性的* ，因为它们使用推论统计学来为给定的实例寻找最佳标签。不像其他算法，只是输出一个“最佳”标签，通常概率算法也输出一个以给定的标签描述概率的实例。此外，许多概率算法输出一个带有相关概率的 n 最佳标签列表， n 具有某个值，而不是简单地输出一个最佳标签。当可能的标签数量相当少时(例如，就分类而言) ，可以设置 n，以便输出所有可能的标签的概率。与非概率算法相比，概率算法有许多优势:

- They output a confidence value associated with their choice. (Note that some other algorithms may also output confidence values, but in general, only for probabilistic algorithms is this value mathematically grounded in probability theory. Non-probabilistic confidence values can in general not be given any specific meaning, and only used to compare against other confidence values output by the same algorithm.)

它们输出一个与他们的选择相关的置信值。（请注意，其他一些算法也可能输出置信值，但一般而言，只有概率算法的该值才以概率论为数学基础。非概率置信值一般不能被赋予任何特定的含义，只能用来与同一算法输出的其他置信值进行比较）。

- Correspondingly, they can
*abstain*when the confidence of choosing any particular output is too low.

相应地，当选择任何特定产出的置信水平太低时，他们可以“弃权”

- Because of the probabilities output, probabilistic pattern-recognition algorithms can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of
*error propagation*.

由于概率输出，概率模式识别算法可以更有效地融入到更大的机器学习任务中，从而部分或完全避免了“错误传播”的问题。

### Number of important feature variables

### Number of important feature variables

重要特征变量的数量

Feature selection algorithms attempt to directly prune out redundant or irrelevant features. A general introduction to feature selection which summarizes approaches and challenges, has been given.^{[11]} The complexity of feature-selection is, because of its non-monotonous character, an optimization problem where given a total of [math]\displaystyle{ n }[/math] features the powerset consisting of all [math]\displaystyle{ 2^n-1 }[/math] subsets of features need to be explored. The Branch-and-Bound algorithm^{[12]} does reduce this complexity but is intractable for medium to large values of the number of available features [math]\displaystyle{ n }[/math]. For a large-scale comparison of feature-selection algorithms see

Feature selection algorithms attempt to directly prune out redundant or irrelevant features. A general introduction to feature selection which summarizes approaches and challenges, has been given. The complexity of feature-selection is, because of its non-monotonous character, an optimization problem where given a total of [math]\displaystyle{ n }[/math] features the powerset consisting of all [math]\displaystyle{ 2^n-1 }[/math] subsets of features need to be explored. The Branch-and-Bound algorithmdoes reduce this complexity but is intractable for medium to large values of the number of available features [math]\displaystyle{ n }[/math]. For a large-scale comparison of feature-selection algorithms see

特征选择算法试图直接去除冗余或不相关的特征。概述了特征选择的方法和挑战，并给出了特征选择的一般性介绍^{[13]}。特征选择的复杂性，由于其非单调的性质，是一个给定了所有数学特征的最佳化问题，需要研究由所有2 ^ n-1 特征子集组成的幂集。分支定界算法^{[14]}确实可以减少这种复杂性，但是对于可用特性数量的中到大值 来说是难以处理的。有关特征选择算法的大规模比较，请参见

.引用错误：没有找到与`</ref>`

对应的`<ref>`

标签

title=Comparison of algorithms that select features for pattern classifiers | journal=Pattern Recognition | volume=33 | pages=25–41 | doi = 10.1016/S0031-3203(99)00041-2 | issue=1|citeseerx=10.1.1.55.1718 }}.</ref>

为模式分类器选择特征的算法比较 | 期刊模式识别 | 第33卷 | 第25-41页 | doi 10.1016 / S0031-3203(99)00041-2 | 第1期 | citeserx 10.1.1.55.1718}。 / 参考

--Miyasaki（讨论）这一段注不知道是哪个地方的。

Techniques to transform the raw feature vectors (**feature extraction**) are sometimes used prior to application of the pattern-matching algorithm. For example, feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as principal components analysis (PCA). The distinction between **feature selection** and **feature extraction** is that the resulting features after feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the features left after feature selection are simply a subset of the original features.

Techniques to transform the raw feature vectors (feature extraction) are sometimes used prior to application of the pattern-matching algorithm. For example, feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as principal components analysis (PCA). The distinction between feature selection and feature extraction is that the resulting features after feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the features left after feature selection are simply a subset of the original features.

在应用模式匹配算法之前，有时会使用原始特征向量转换技术(特征提取)。例如，特征提取算法试图利用主成分分析(PCA)等数学技术，将一个大维特征向量降低为一个更容易处理的小维特征向量，并对其进行更少的冗余编码。特征选择和特征提取之间的区别在于，特征提取后得到的特征与原始特征的排序不同，可能不容易解释，而特征选择后留下的特征只是原始特征的子集。

## Problem statement (supervised version)

## Problem statement (supervised version)

问题语句(监督版本)

Formally, the problem of supervised pattern recognition can be stated as follows: Given an unknown function [math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math] (the *ground truth*) that maps input instances [math]\displaystyle{ \boldsymbol{x} \in \mathcal{X} }[/math] to output labels [math]\displaystyle{ y \in \mathcal{Y} }[/math], along with training data [math]\displaystyle{ \mathbf{D} = \{(\boldsymbol{x}_1,y_1),\dots,(\boldsymbol{x}_n, y_n)\} }[/math] assumed to represent accurate examples of the mapping, produce a function [math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math] that approximates as closely as possible the correct mapping [math]\displaystyle{ g }[/math]. (For example, if the problem is filtering spam, then [math]\displaystyle{ \boldsymbol{x}_i }[/math] is some representation of an email and [math]\displaystyle{ y }[/math] is either "spam" or "non-spam"). In order for this to be a well-defined problem, "approximates as closely as possible" needs to be defined rigorously. In decision theory, this is defined by specifying a loss function or cost function that assigns a specific value to "loss" resulting from producing an incorrect label. The goal then is to minimize the expected loss, with the expectation taken over the probability distribution of [math]\displaystyle{ \mathcal{X} }[/math]. In practice, neither the distribution of [math]\displaystyle{ \mathcal{X} }[/math] nor the ground truth function [math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math] are known exactly, but can be computed only empirically by collecting a large number of samples of [math]\displaystyle{ \mathcal{X} }[/math] and hand-labeling them using the correct value of [math]\displaystyle{ \mathcal{Y} }[/math] (a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected). The particular loss function depends on the type of label being predicted. For example, in the case of classification, the simple zero-one loss function is often sufficient. This corresponds simply to assigning a loss of 1 to any incorrect labeling and implies that the optimal classifier minimizes the error rate on independent test data (i.e. counting up the fraction of instances that the learned function [math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math] labels wrongly, which is equivalent to maximizing the number of correctly classified instances). The goal of the learning procedure is then to minimize the error rate (maximize the correctness) on a "typical" test set.

Formally, the problem of supervised pattern recognition can be stated as follows: Given an unknown function [math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math] (the ground truth) that maps input instances [math]\displaystyle{ \boldsymbol{x} \in \mathcal{X} }[/math] to output labels [math]\displaystyle{ y \in \mathcal{Y} }[/math], along with training data [math]\displaystyle{ \mathbf{D} = \{(\boldsymbol{x}_1,y_1),\dots,(\boldsymbol{x}_n, y_n)\} }[/math] assumed to represent accurate examples of the mapping, produce a function [math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math] that approximates as closely as possible the correct mapping [math]\displaystyle{ g }[/math]. (For example, if the problem is filtering spam, then [math]\displaystyle{ \boldsymbol{x}_i }[/math] is some representation of an email and [math]\displaystyle{ y }[/math] is either "spam" or "non-spam"). In order for this to be a well-defined problem, "approximates as closely as possible" needs to be defined rigorously. In decision theory, this is defined by specifying a loss function or cost function that assigns a specific value to "loss" resulting from producing an incorrect label. The goal then is to minimize the expected loss, with the expectation taken over the probability distribution of [math]\displaystyle{ \mathcal{X} }[/math]. In practice, neither the distribution of [math]\displaystyle{ \mathcal{X} }[/math] nor the ground truth function [math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math] are known exactly, but can be computed only empirically by collecting a large number of samples of [math]\displaystyle{ \mathcal{X} }[/math] and hand-labeling them using the correct value of [math]\displaystyle{ \mathcal{Y} }[/math] (a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected). The particular loss function depends on the type of label being predicted. For example, in the case of classification, the simple zero-one loss function is often sufficient. This corresponds simply to assigning a loss of 1 to any incorrect labeling and implies that the optimal classifier minimizes the error rate on independent test data (i.e. counting up the fraction of instances that the learned function [math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math] labels wrongly, which is equivalent to maximizing the number of correctly classified instances). The goal of the learning procedure is then to minimize the error rate (maximize the correctness) on a "typical" test set.

形式上，有监督的模式识别问题可以表述如下: 给定一个未知函数 [math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math]，该函数将 [math]\displaystyle{ \boldsymbol{x} \in \mathcal{X} }[/math] 的输入实例映射为数学 [math]\displaystyle{ y \in \mathcal{Y} }[/math]中的输出标签, 随着训练数据 [math]\displaystyle{ \mathbf{D} = \{(\boldsymbol{x}_1,y_1),\dots,(\boldsymbol{x}_n, y_n)\} }[/math]假定代表精确的映射例子，生成一个函数 [math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math]，它尽可能逼近正确的映射g 。(例如，如果问题是过滤垃圾邮件，那么 [math]\displaystyle{ \boldsymbol{x}_i }[/math] 是电子邮件的某种表示，y是“ 滥发电邮”或“非滥发电邮”）为了使这成为一个定义明确的问题，“尽可能接近”需要严格定义。在决策理论中，这是通过指定一个损失函数或成本函数来定义的，该函数或成本函数将一个特定的值赋予因产生不正确的标签而导致的“损失”。那么目标就是最小化期望损失，期望占据了概率分布。在实践中，无论是x分布，y分布，还是基本真理函数数学[math]\displaystyle{ g:\mathcal{X}\rightarrow\mathcal{Y} }[/math]数学的分布，都是已知的，但只能通过收集大量的x的样本，并手工标记他们使用正确的y值(一个耗时的过程，这是典型的限制因素在数量的数据可以被收集)。特定的损失函数取决于所预测的标签类型。例如，在分类的情况下，简单的零一损耗函数往往是足够的。这相当于简单地将1的损失分配给任何不正确的标记，并意味着最优分类器最小化独立测试数据的错误率(即最小化独立测试数据的错误率)。计算已学函数[math]\displaystyle{ h:\mathcal{X}\rightarrow\mathcal{Y} }[/math]标签错误的实例的分数，这相当于最大化正确分类的实例数)。然后，学习过程的目标是最小化“典型”测试集上的错误率(最大化正确性)。

For a probabilistic pattern recognizer, the problem is instead to estimate the probability of each possible output label given a particular input instance, i.e., to estimate a function of the form

For a probabilistic pattern recognizer, the problem is instead to estimate the probability of each possible output label given a particular input instance, i.e., to estimate a function of the form

对于概率模式识别器，问题就变成了估计给定特定输入实例的每个可能输出标签的概率，也就是估计形式的一个函数

- [math]\displaystyle{ p({\rm label}|\boldsymbol{x},\boldsymbol\theta) = f\left(\boldsymbol{x};\boldsymbol{\theta}\right) }[/math]

[math]\displaystyle{ p({\rm label}|\boldsymbol{x},\boldsymbol\theta) = f\left(\boldsymbol{x};\boldsymbol{\theta}\right) }[/math]

数学 p ( rm label } | boldsymbol { x } ， boldsymbol theta) f 左( boldsymbol { x } ; boldsymbol { theta }右) / math

where the feature vector input is [math]\displaystyle{ \boldsymbol{x} }[/math], and the function *f* is typically parameterized by some parameters [math]\displaystyle{ \boldsymbol{\theta} }[/math].^{[15]} In a discriminative approach to the problem, *f* is estimated directly. In a generative approach, however, the inverse probability [math]\displaystyle{ p({\boldsymbol{x}|\rm label}) }[/math] is instead estimated and combined with the prior probability [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] using Bayes' rule, as follows:

where the feature vector input is [math]\displaystyle{ \boldsymbol{x} }[/math], and the function f is typically parameterized by some parameters [math]\displaystyle{ \boldsymbol{\theta} }[/math]. In a discriminative approach to the problem, f is estimated directly. In a generative approach, however, the inverse probability [math]\displaystyle{ p({\boldsymbol{x}|\rm label}) }[/math] is instead estimated and combined with the prior probability [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] using Bayes' rule, as follows:

其中特征向量输入是x，函数 f 通常由一些参数 [math]\displaystyle{ \boldsymbol{\theta} }[/math]参数化。在问题的判别法中，f 是直接估计的。然而，在生成式方法中，逆概率 [math]\displaystyle{ p({\boldsymbol{x}|\rm label}) }[/math] 是用 Bayes 规则估计和结合先验概率[math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math]的，如下所示:

- [math]\displaystyle{ p({\rm label}|\boldsymbol{x},\boldsymbol\theta) = \frac{p({\boldsymbol{x}|\rm label,\boldsymbol\theta}) p({\rm label|\boldsymbol\theta})}{\sum_{L \in \text{all labels}} p(\boldsymbol{x}|L) p(L|\boldsymbol\theta)}. }[/math]

When the labels are continuously distributed (e.g., in regression analysis), the denominator involves integration rather than summation:

When the labels are continuously distributed (e.g., in regression analysis), the denominator involves integration rather than summation:

当标签是连续分布的时候(例如，在回归分析中) ，分母包含积分而不是总和:

- [math]\displaystyle{ p({\rm label}|\boldsymbol{x},\boldsymbol\theta) = \frac{p({\boldsymbol{x}|\rm label,\boldsymbol\theta}) p({\rm label|\boldsymbol\theta})}{\int_{L \in \text{all labels}} p(\boldsymbol{x}|L) p(L|\boldsymbol\theta) \operatorname{d}L}. }[/math]

The value of [math]\displaystyle{ \boldsymbol\theta }[/math] is typically learned using maximum a posteriori (MAP) estimation. This finds the best value that simultaneously meets two conflicting objects: To perform as well as possible on the training data (smallest error-rate) and to find the simplest possible model. Essentially, this combines maximum likelihood estimation with a regularization procedure that favors simpler models over more complex models. In a Bayesian context, the regularization procedure can be viewed as placing a prior probability [math]\displaystyle{ p(\boldsymbol\theta) }[/math] on different values of [math]\displaystyle{ \boldsymbol\theta }[/math]. Mathematically:

The value of [math]\displaystyle{ \boldsymbol\theta }[/math] is typically learned using maximum a posteriori (MAP) estimation. This finds the best value that simultaneously meets two conflicting objects: To perform as well as possible on the training data (smallest error-rate) and to find the simplest possible model. Essentially, this combines maximum likelihood estimation with a regularization procedure that favors simpler models over more complex models. In a Bayesian context, the regularization procedure can be viewed as placing a prior probability [math]\displaystyle{ p(\boldsymbol\theta) }[/math] on different values of [math]\displaystyle{ \boldsymbol\theta }[/math]. Mathematically:

[math]\displaystyle{ \boldsymbol\theta }[/math]的值通常使用最大后验估计(maximum a posteriori，MAP)来学习。这就找到了同时满足两个冲突对象的最佳值: 在训练数据时执行地尽可能好(最小的错误率)并找到最简单的可能模型。本质上，这结合了最大似然估计与更倾向于简单的模型而不是更复杂的模型的正则化程序。在贝叶斯背景中，正则化过程可以被看作是将先验概率[math]\displaystyle{ p(\boldsymbol\theta) }[/math]放在 [math]\displaystyle{ \boldsymbol\theta }[/math]的不同值上。数学上:

- [math]\displaystyle{ \boldsymbol\theta^* = \arg \max_{\boldsymbol\theta} p(\boldsymbol\theta|\mathbf{D}) }[/math]

where [math]\displaystyle{ \boldsymbol\theta^* }[/math] is the value used for [math]\displaystyle{ \boldsymbol\theta }[/math] in the subsequent evaluation procedure, and [math]\displaystyle{ p(\boldsymbol\theta|\mathbf{D}) }[/math], the posterior probability of [math]\displaystyle{ \boldsymbol\theta }[/math], is given by

where [math]\displaystyle{ \boldsymbol\theta^* }[/math] is the value used for [math]\displaystyle{ \boldsymbol\theta }[/math] in the subsequent evaluation procedure, and [math]\displaystyle{ p(\boldsymbol\theta|\mathbf{D}) }[/math], the posterior probability of [math]\displaystyle{ \boldsymbol\theta }[/math], is given by

其中[math]\displaystyle{ \boldsymbol\theta^* }[/math]是后续评估过程中用于[math]\displaystyle{ \boldsymbol\theta }[/math]的值，而[math]\displaystyle{ p(\boldsymbol\theta|\mathbf{D}) }[/math]，[math]\displaystyle{ \boldsymbol\theta }[/math]的后验概率，是由

- [math]\displaystyle{ p(\boldsymbol\theta|\mathbf{D}) = \left[\prod_{i=1}^n p(y_i|\boldsymbol{x}_i,\boldsymbol\theta) \right] p(\boldsymbol\theta). }[/math]

In the Bayesian approach to this problem, instead of choosing a single parameter vector [math]\displaystyle{ \boldsymbol{\theta}^* }[/math], the probability of a given label for a new instance [math]\displaystyle{ \boldsymbol{x} }[/math] is computed by integrating over all possible values of [math]\displaystyle{ \boldsymbol\theta }[/math], weighted according to the posterior probability:

In the Bayesian approach to this problem, instead of choosing a single parameter vector [math]\displaystyle{ \boldsymbol{\theta}^* }[/math], the probability of a given label for a new instance [math]\displaystyle{ \boldsymbol{x} }[/math] is computed by integrating over all possible values of [math]\displaystyle{ \boldsymbol\theta }[/math], weighted according to the posterior probability:

在贝叶斯方法的这个问题，而不是选择一个单一的参数向量符号 [math]\displaystyle{ \boldsymbol{\theta}^* }[/math]，一个新的实例符号 [math]\displaystyle{ \boldsymbol{x} }[/math]的给定标签的概率计算是积分所有可能的符号 [math]\displaystyle{ \boldsymbol\theta }[/math]的值，根据后验概率加权:

- [math]\displaystyle{ p({\rm label}|\boldsymbol{x}) = \int p({\rm label}|\boldsymbol{x},\boldsymbol\theta)p(\boldsymbol{\theta}|\mathbf{D}) \operatorname{d}\boldsymbol{\theta}. }[/math]

### Frequentist or Bayesian approach to pattern recognition

### Frequentist or Bayesian approach to pattern recognition

模式识别的频率论或 贝叶斯方法Bayesian approach

The first pattern classifier – the linear discriminant presented by Fisher – was developed in the frequentist tradition. The frequentist approach entails that the model parameters are considered unknown, but objective. The parameters are then computed (estimated) from the collected data. For the linear discriminant, these parameters are precisely the mean vectors and the covariance matrix. Also the probability of each class [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] is estimated from the collected dataset. Note that the usage of 'Bayes rule' in a pattern classifier does not make the classification approach Bayesian.

The first pattern classifier – the linear discriminant presented by Fisher – was developed in the frequentist tradition. The frequentist approach entails that the model parameters are considered unknown, but objective. The parameters are then computed (estimated) from the collected data. For the linear discriminant, these parameters are precisely the mean vectors and the covariance matrix. Also the probability of each class [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] is estimated from the collected dataset. Note that the usage of 'Bayes rule' in a pattern classifier does not make the classification approach Bayesian.

第一个模式分类器—— Fisher 提出的线性判别器——是在频率论的传统中发展起来的。频率论方法认为模型参数是未知的，但是是客观的。然后根据收集到的数据计算(估计)参数。对于线性鉴别器，这些参数正好是平均向量和协方差矩阵。并且从收集的数据集中估计每个类的[math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math]的概率。注意，在模式分类器中使用“贝叶斯规则”并不使分类方法成为贝叶斯方法。

Bayesian statistics has its origin in Greek philosophy where a distinction was already made between the 'a priori' and the 'a posteriori' knowledge. Later Kant defined his distinction between what is a priori known – before observation – and the empirical knowledge gained from observations. In a Bayesian pattern classifier, the class probabilities [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] can be chosen by the user, which are then a priori. Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g., the Beta- (conjugate prior) and Dirichlet-distributions. The Bayesian approach facilitates a seamless intermixing between expert knowledge in the form of subjective probabilities, and objective observations.

Bayesian statistics has its origin in Greek philosophy where a distinction was already made between the 'a priori' and the 'a posteriori' knowledge. Later Kant defined his distinction between what is a priori known – before observation – and the empirical knowledge gained from observations. In a Bayesian pattern classifier, the class probabilities [math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math] can be chosen by the user, which are then a priori. Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g., the Beta- (conjugate prior) and Dirichlet-distributions. The Bayesian approach facilitates a seamless intermixing between expert knowledge in the form of subjective probabilities, and objective observations.

贝叶斯统计起源于希腊哲学，在那里已经区分了先验知识和后验知识。后来，康德定义了他的区别，什么是先天知道的-在观察之前-和从观察中获得的经验知识。在贝叶斯模式分类器中，用户可以选择类概率[math]\displaystyle{ p({\rm label}|\boldsymbol\theta) }[/math]，这是一个先验的。此外，作为先验参数值量化的经验可以用经验观察值加权——例如使用 Beta-(共轭先验)和 dirichlet-分布。贝叶斯方法促进了主观概率形式的专家知识和客观观察之间的无缝混合。

Probabilistic pattern classifiers can be used according to a frequentist or a Bayesian approach.

Probabilistic pattern classifiers can be used according to a frequentist or a Bayesian approach.

概率模式分类器可以分为频率分类器和贝叶斯分类器。

## Uses

## Uses

用途

The face was automatically detected by special software.

这张脸是由专门的软件自动检测出来的

Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD) systems. CAD describes a procedure that supports the doctor's interpretations and findings.

Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD) systems. CAD describes a procedure that supports the doctor's interpretations and findings.

在医学领域，模式识别是电脑辅助诊断计算机辅助设计系统的基础。描述了一个程序，支持医生的解释和发现。

Other typical applications of pattern recognition techniques are automatic speech recognition, classification of text into several categories (e.g., spam/non-spam email messages), the automatic recognition of handwriting on postal envelopes, automatic recognition of images of human faces, or handwriting image extraction from medical forms.^{[16]} The last two examples form the subtopic image analysis of pattern recognition that deals with digital images as input to pattern recognition systems.^{[17]}^{[18]}

Other typical applications of pattern recognition techniques are automatic speech recognition, classification of text into several categories (e.g., spam/non-spam email messages), the automatic recognition of handwriting on postal envelopes, automatic recognition of images of human faces, or handwriting image extraction from medical forms. The last two examples form the subtopic image analysis of pattern recognition that deals with digital images as input to pattern recognition systems.

模式识别技术的其他典型应用包括自动语音识别、将文字分类(例如滥发 / 非滥发电邮讯息)、自动识别邮件信封上的手写字迹、自动识别人脸图像，或从医疗表格中提取手写字迹.^{[19]}。最后两个例子构成了模式识别的子主题图像分析，将数字图像作为模式识别系统的输入^{[17]}^{[20]}。

Optical character recognition is a classic example of the application of a pattern classifier, see

Optical character recognition is a classic example of the application of a pattern classifier, see

光学字符识别是一个应用模式分类器的经典例子，见

The method of signing one's name was captured with stylus and overlay starting in 1990.^{[citation needed]} The strokes, speed, relative min, relative max, acceleration and pressure is used to uniquely identify and confirm identity. Banks were first offered this technology, but were content to collect from the FDIC for any bank fraud and did not want to inconvenience customers..^{[citation needed]}

The method of signing one's name was captured with stylus and overlay starting in 1990. The strokes, speed, relative min, relative max, acceleration and pressure is used to uniquely identify and confirm identity. Banks were first offered this technology, but were content to collect from the FDIC for any bank fraud and did not want to inconvenience customers..

1990年使用手写笔和覆盖笔来签名的方法始自。笔画，速度，相对最小，相对最大，加速度和压力是用来唯一识别和确认身份的方法。银行首先获得了这项技术，但它们仅限于从联邦存款保险公司收取任何银行欺诈费用，并且不想给客户带来不便。

Artificial neural networks (neural net classifiers) and deep learning have many real-world applications in image processing, a few examples:

Artificial neural networks (neural net classifiers) and deep learning have many real-world applications in image processing, a few examples:

人工神经网络(神经网络分类器)和深度学习在图像处理中有许多实际应用，例如:

- identification and authentication: e.g., license plate recognition,
^{[21]}fingerprint analysis, face detection/verification;,^{[22]}and voice-based authentication.^{[23]}

身份验证：例如牌照识别，指纹分析，面部识别^{[24]} fingerprint analysis, face detection/verification;,^{[25]} ，和声纹认证^{[26]}
。

- medical diagnosis: e.g., screening for cervical cancer (Papnet),
^{[27]}breast tumors or heart sounds;

医疗诊断：例如检查宫颈癌（Papnet）^{[28]}，胸部肿瘤或心脏声。

- defence: various navigation and guidance systems, target recognition systems, shape recognition technology etc.

保护：各种导航系统，目标识别系统，形状识别技术等。

- mobility: advanced driver assistance systems, autonomous vehicle technology, etc.
^{[29]}^{[30]}^{[31]}^{[32]}^{[33]}

机动性：先进的驾驶辅助系统，自动车辆技术等。^{[34]}^{[35]}^{[36]}^{[37]}^{[38]}

For a discussion of the aforementioned applications of neural networks in image processing, see e.g.^{[39]}

For a discussion of the aforementioned applications of neural networks in image processing, see e.g.

关于前面提到的神经网络在图像处理中的应用，请参阅参考文献

In psychology, pattern recognition (making sense of and identifying objects) is closely related to perception, which explains how the sensory inputs humans receive are made meaningful. Pattern recognition can be thought of in two different ways: the first being template matching and the second being feature detection.

In psychology, pattern recognition (making sense of and identifying objects) is closely related to perception, which explains how the sensory inputs humans receive are made meaningful. Pattern recognition can be thought of in two different ways: the first being template matching and the second being feature detection.

在心理学中，模式识别(理解和识别物体)与知觉密切相关，这解释了人类接收到的感官输入是如何变得有意义的。模式识别可以用两种不同的方式来思考: 第一种是模板匹配，第二种是特征提取。

A template is a pattern used to produce items of the same proportions. The template-matching hypothesis suggests that incoming stimuli are compared with templates in the long term memory. If there is a match, the stimulus is identified.

A template is a pattern used to produce items of the same proportions. The template-matching hypothesis suggests that incoming stimuli are compared with templates in the long term memory. If there is a match, the stimulus is identified.

模板是用于生产相同比例的物品的模式。模板匹配假说认为外来刺激在长时记忆中与模板进行比较。如果有匹配，刺激就会被识别出来。

Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959), suggest that the stimuli are broken down into their component parts for identification. For example, a capital E has three horizontal lines and one vertical line.^{[40]}

Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959), suggest that the stimuli are broken down into their component parts for identification. For example, a capital E has three horizontal lines and one vertical line.

特征提取模型，例如 Pandemonium 字母分类系统(Selfridge，1959) ，表明刺激物被分解成它们的组成部分进行识别。例如，大写的e有三条水平线和一条垂直线。^{[41]}

## Algorithms

## Algorithms

算法

Algorithms for pattern recognition depend on the type of label output, on whether learning is supervised or unsupervised, and on whether the algorithm is statistical or non-statistical in nature. Statistical algorithms can further be categorized as generative or discriminative.

Algorithms for pattern recognition depend on the type of label output, on whether learning is supervised or unsupervised, and on whether the algorithm is statistical or non-statistical in nature. Statistical algorithms can further be categorized as generative or discriminative.

模式识别算法取决于标签输出的类型，取决于学习是有监督的还是无监督的，取决于算法本质上是统计的还是非统计的。统计算法可以进一步分为生成式算法和区分式算法。

### Classification algorithms (supervised algorithms predicting categorical labels)

### Classification algorithms (supervised algorithms predicting categorical labels)

分类算法(预测分类标签的监督算法)

Parametric:^{[42]}

Parametric:

参数:

线性判别分析

二次判别分析

- Maximum entropy classifier (aka logistic regression, multinomial logistic regression): Note that logistic regression is an algorithm for classification, despite its name. (The name comes from the fact that logistic regression uses an extension of a linear regression model to model the probability of an input being in a particular class.)

[最大熵分类器]]（又名[[逻辑回归]，[[多项逻辑回归]）：请注意，logistic回归是一种分类算法，尽管它的名字。（这个名字来源于这样一个事实，即logistic回归使用线性回归模型的扩展来建模输入在特定类中的概率。

Nonparametric:^{[43]}

Nonparametric:

非参数:

决策树

- Kernel estimation and K-nearest-neighbor algorithms

可变核密度估计 用于统计分类|核估计]]和K-最近邻算法

朴素贝叶斯分类

- Neural networks (multi-layer perceptrons)

神经网络 多层感知器

感知

支持向量机

基因表达程序设计

### Clustering algorithms (unsupervised algorithms predicting categorical labels)

### Clustering algorithms (unsupervised algorithms predicting categorical labels)

聚类算法(无监督的预测分类标签的算法)

- Categorical mixture models

分类 混合模型

- Hierarchical clustering (agglomerative or divisive)

层次聚类（聚集或分裂）

K均值聚类

联系聚类

- Kernel principal component analysis (Kernel PCA)

核主成分分析

### Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)

### Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)

集成学习算法(将多种学习算法结合在一起的有监督的元算法)

Boosting（元算法）

- Bootstrap aggregating ("bagging")

引导聚合

系综平均

混合专家

### General algorithms for predicting arbitrarily-structured (sets of) labels

### General algorithms for predicting arbitrarily-structured (sets of) labels

预测任意结构(集)标签的一般算法

贝恩斯网络

马尔科夫随机领域

### Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor representations)

### Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor representations)

多线性子空间学习算法(使用张量表示法预测多维数据的标签)

Unsupervised:

Unsupervised:

无人监督:

### Real-valued sequence labeling algorithms (predicting sequences of real-valued labels)

### Real-valued sequence labeling algorithms (predicting sequences of real-valued labels)

实值序列标记算法(预测实值标记序列)

Supervised (?):

Supervised (?):

监督(？):

卡尔曼滤波器

颗粒过滤器

### Regression algorithms (predicting real-valued labels)

### Regression algorithms (predicting real-valued labels)

回归算法(预测实值标签)

Supervised:

Supervised:

监督:

- Gaussian process regression (kriging)

高斯过程回归

- Linear regression and extensions

线性回归

神经网络和深度学习

Unsupervised:

Unsupervised:

无人监督:

独立成分分析

主要成分分析

### Sequence labeling algorithms (predicting sequences of categorical labels)

### Sequence labeling algorithms (predicting sequences of categorical labels)

序列标记算法(分类标记的预测序列)

Supervised:

Supervised:

监督:

- Conditional random fields (CRFs)

条件随机场

- Hidden Markov models (HMMs)

隐藏马尔可夫模型

- Maximum entropy Markov models (MEMMs)

最大熵马尔可夫模型

- Recurrent neural networks (RNNs)

循环神经网络

Unsupervised:

Unsupervised:

无人监督:

- Hidden Markov models (HMMs)

隐藏马尔科夫模型

- Dynamic time warping (DTW)

动态时间归整

## See also

## See also

参见

自适应共振理论

缓存语言模式

复合词处理

计算机辅助诊断

数据挖掘

深度学习

信息论

数值分析软件清单

数字图书馆

机器学习

多线性子空间学习

新认知

感知

感知学习

预测学习

模式识别的先验知识

序列挖掘

模板匹配

上下文图像分类

机器学习研究中数据集的上下文图像分类列表

## References

## References

参考资料

- ↑ 模板:Harv
- ↑ 模板:Harv
- ↑ Howard, W.R. (2007-02-20). "Pattern Recognition and Machine Learning20072Christopher M. Bishop. Pattern Recognition and Machine Learning. Heidelberg, Germany: Springer 2006. i‐xx, 740 pp., 模板:Text: 0‐387‐31073‐8 $74.95 Hardcover".
*Kybernetes*.**36**(2): 275. doi:10.1108/03684920710743466. ISSN 0368-492X. - ↑ "Sequence Labeling" (PDF).
*utah.edu*. - ↑ Ian., Chiswell (2007).
*Mathematical logic, p. 34*. Oxford University Press. ISBN 9780199215621. OCLC 799802313. - ↑ Howard, W.R. (2007-02-20). "Pattern Recognition and Machine Learning20072Christopher M. Bishop. Pattern Recognition and Machine Learning. Heidelberg, Germany: Springer 2006. i‐xx, 740 pp., 模板:Text: 0‐387‐31073‐8 $74.95 Hardcover".
*Kybernetes*.**36**(2): 275. doi:10.1108/03684920710743466. ISSN 0368-492X. - ↑ "Sequence Labeling" (PDF).
*utah.edu*. - ↑ Ian., Chiswell (2007).
*Mathematical logic, p. 34*. Oxford University Press. ISBN 9780199215621. OCLC 799802313. - ↑ Carvalko, J.R., Preston K. (1972). "On Determining Optimum Simple Golay Marking Transforms for Binary Image Processing".
*IEEE Transactions on Computers*.**21**(12): 1430–33. doi:10.1109/T-C.1972.223519.CS1 maint: multiple names: authors list (link). - ↑ Carvalko, J.R., Preston K. (1972). "On Determining Optimum Simple Golay Marking Transforms for Binary Image Processing".
*IEEE Transactions on Computers*.**21**(12): 1430–33. doi:10.1109/T-C.1972.223519.CS1 maint: multiple names: authors list (link). - ↑ Isabelle Guyon Clopinet, André Elisseeff (2003).
*An Introduction to Variable and Feature Selection*. The Journal of Machine Learning Research, Vol. 3, 1157-1182. Link - ↑ {{Cite journal|author1=Iman Foroutan |author2=Jack Sklansky | year=1987 |
Iman Foroutan; Jack Sklansky (1987). "Feature Selection for Automatic Classification of Non-Gaussian Data".
*IEEE Transactions on Systems, Man and Cybernetics*.**17**(2 非高斯数据自动分类的特征选择): 187–198. doi:10.1109/TSMC.1987.4309029. Text " {引用期刊 " ignored (help); Text " 作者 Iman Foroutan " ignored (help); Text " 1987 " ignored (help); Text " doi 10.1109 / TSMC. 1987.4309029 " ignored (help); Text " 第2期 " ignored (help); Text " IEEE 系统，人和控制论汇刊 " ignored (help); Text " 作者 Jack Sklansky " ignored (help); Text " 卷17 " ignored (help); Text " 页187-198 " ignored (help); line feed character in`|issue=`

at position 2 (help). - ↑ Isabelle Guyon Clopinet, André Elisseeff (2003).
*An Introduction to Variable and Feature Selection*. The Journal of Machine Learning Research, Vol. 3, 1157-1182. Link - ↑ {{Cite journal|author1=Iman Foroutan |author2=Jack Sklansky | year=1987 |
Iman Foroutan; Jack Sklansky (1987). "Feature Selection for Automatic Classification of Non-Gaussian Data".
*IEEE Transactions on Systems, Man and Cybernetics*.**17**(2 非高斯数据自动分类的特征选择): 187–198. doi:10.1109/TSMC.1987.4309029. Text " {引用期刊 " ignored (help); Text " 作者 Iman Foroutan " ignored (help); Text " 1987 " ignored (help); Text " doi 10.1109 / TSMC. 1987.4309029 " ignored (help); Text " 第2期 " ignored (help); Text " IEEE 系统，人和控制论汇刊 " ignored (help); Text " 作者 Jack Sklansky " ignored (help); Text " 卷17 " ignored (help); Text " 页187-198 " ignored (help); line feed character in`|issue=`

at position 2 (help). - ↑ For linear discriminant analysis the parameter vector [math]\displaystyle{ \boldsymbol\theta }[/math] consists of the two mean vectors [math]\displaystyle{ \boldsymbol\mu_1 }[/math] and [math]\displaystyle{ \boldsymbol\mu_2 }[/math] and the common covariance matrix [math]\displaystyle{ \boldsymbol\Sigma }[/math].
- ↑ Milewski, Robert; Govindaraju, Venu (31 March 2008). "Binarization and cleanup of handwritten text from carbon copy medical form images".
*Pattern Recognition*.**41**(4): 1308–1315. doi:10.1016/j.patcog.2007.08.018. - ↑
^{17.0}^{17.1}Richard O. Duda, Peter E. Hart, David G. Stork (2001).*Pattern classification*(2nd ed.). Wiley, New York. ISBN 978-0-471-05669-0. https://books.google.com/books?id=Br33IRC3PkQC. - ↑ R. Brunelli,
*Template Matching Techniques in Computer Vision: Theory and Practice*, Wiley, , 2009 - ↑ Milewski, Robert; Govindaraju, Venu (31 March 2008). "Binarization and cleanup of handwritten text from carbon copy medical form images".
*Pattern Recognition*.**41**(4): 1308–1315. doi:10.1016/j.patcog.2007.08.018. - ↑ R. Brunelli,
*Template Matching Techniques in Computer Vision: Theory and Practice*, Wiley, , 2009 - ↑ THE AUTOMATIC NUMBER PLATE RECOGNITION TUTORIAL http://anpr-tutorial.com/
- ↑ Neural Networks for Face Recognition Companion to Chapter 4 of the textbook Machine Learning.
- ↑ Poddar, Arnab; Sahidullah, Md; Saha, Goutam (March 2018). "Speaker Verification with Short Utterances: A Review of Challenges, Trends and Opportunities".
*IET Biometrics*.**7**(2): 91–101. doi:10.1049/iet-bmt.2017.0065. - ↑ THE AUTOMATIC NUMBER PLATE RECOGNITION TUTORIAL http://anpr-tutorial.com/
- ↑ Neural Networks for Face Recognition Companion to Chapter 4 of the textbook Machine Learning.
- ↑ Poddar, Arnab; Sahidullah, Md; Saha, Goutam (March 2018). "Speaker Verification with Short Utterances: A Review of Challenges, Trends and Opportunities".
*IET Biometrics*.**7**(2): 91–101. doi:10.1049/iet-bmt.2017.0065. - ↑ PAPNET For Cervical Screening Archive.is的存檔，存档日期2012-07-08 "Archived copy". Archived from the original on 2012-07-08. Retrieved 2012-05-06.CS1 maint: archived copy as title (link)
- ↑ PAPNET For Cervical Screening Archive.is的存檔，存档日期2012-07-08 "Archived copy". Archived from the original on 2012-07-08. Retrieved 2012-05-06.CS1 maint: archived copy as title (link)
- ↑ "Development of an Autonomous Vehicle Control Strategy Using a Single Camera and Deep Neural Networks (2018-01-0035 Technical Paper)- SAE Mobilus".
*saemobilus.sae.org*(in English). Retrieved 2019-09-06. - ↑ Gerdes, J. Christian; Kegelman, John C.; Kapania, Nitin R.; Brown, Matthew; Spielberg, Nathan A. (2019-03-27). "Neural network vehicle models for high-performance automated driving".
*Science Robotics*(in English).**4**(28): eaaw1975. doi:10.1126/scirobotics.aaw1975. ISSN 2470-9476. - ↑ Pickering, Chris (2017-08-15). "How AI is paving the way for fully autonomous cars".
*The Engineer*(in English). Retrieved 2019-09-06. - ↑ Ray, Baishakhi; Jana, Suman; Pei, Kexin; Tian, Yuchi (2017-08-28). "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars" (in English). arXiv:1708.08559. Bibcode:2017arXiv170808559T. Cite journal requires
`|journal=`

(help) - ↑ Sinha, P. K.; Hadjiiski, L. M.; Mutib, K. (1993-04-01). "Neural Networks in Autonomous Vehicle Control".
*IFAC Proceedings Volumes*. 1st IFAC International Workshop on Intelligent Autonomous Vehicles, Hampshire, UK, 18–21 April.**26**(1): 335–340. doi:10.1016/S1474-6670(17)49322-0. ISSN 1474-6670. - ↑ "Development of an Autonomous Vehicle Control Strategy Using a Single Camera and Deep Neural Networks (2018-01-0035 Technical Paper)- SAE Mobilus".
*saemobilus.sae.org*(in English). Retrieved 2019-09-06. - ↑ Gerdes, J. Christian; Kegelman, John C.; Kapania, Nitin R.; Brown, Matthew; Spielberg, Nathan A. (2019-03-27). "Neural network vehicle models for high-performance automated driving".
*Science Robotics*(in English).**4**(28): eaaw1975. doi:10.1126/scirobotics.aaw1975. ISSN 2470-9476. - ↑ Pickering, Chris (2017-08-15). "How AI is paving the way for fully autonomous cars".
*The Engineer*(in English). Retrieved 2019-09-06. - ↑ Ray, Baishakhi; Jana, Suman; Pei, Kexin; Tian, Yuchi (2017-08-28). "DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars" (in English). arXiv:1708.08559. Bibcode:2017arXiv170808559T. Cite journal requires
`|journal=`

(help) - ↑ Sinha, P. K.; Hadjiiski, L. M.; Mutib, K. (1993-04-01). "Neural Networks in Autonomous Vehicle Control".
*IFAC Proceedings Volumes*. 1st IFAC International Workshop on Intelligent Autonomous Vehicles, Hampshire, UK, 18–21 April.**26**(1): 335–340. doi:10.1016/S1474-6670(17)49322-0. ISSN 1474-6670. - ↑ Egmont-Petersen, M., de Ridder, D., Handels, H. (2002). "Image processing with neural networks - a review".
*Pattern Recognition*.**35**(10 利用神经网络进行图像处理-回顾): 2279–2301. CiteSeerX 10.1.1.21.5444. doi:10.1016/S0031-3203(01)00178-9. Text " { Cite journal " ignored (help); Text " author Egmont-Petersen，m. ，D. Ridder，d. ，Handels，h. " ignored (help); Text " 期刊模式识别 " ignored (help); Text " 第10期 " ignored (help); Text " 第35卷 " ignored (help); Text " year 2002 " ignored (help); Text " 第2279-2301页 " ignored (help); Text " doi 10.1016 / S0031-3203(01)00178-9 " ignored (help); line feed character in`|issue=`

at position 3 (help)CS1 maint: multiple names: authors list (link) - ↑ "A-level Psychology Attention Revision - Pattern recognition | S-cool, the revision website". S-cool.co.uk. Retrieved 2012-09-17.
- ↑ "A-level Psychology Attention Revision - Pattern recognition | S-cool, the revision website". S-cool.co.uk. Retrieved 2012-09-17.
- ↑ Assuming known distributional shape of feature distributions per class, such as the Gaussian shape.
- ↑ No distributional assumption regarding shape of feature distributions per class.

## Further reading

## Further reading

进一步阅读

- Fukunaga, Keinosuke (1990).
*Introduction to Statistical Pattern Recognition*(2nd ed.). Boston: Academic Press. ISBN 978-0-12-269851-4. https://archive.org/details/introductiontost1990fuku.

- Hornegger, Joachim; Paulus, Dietrich W. R. (1999).
*Applied Pattern Recognition: A Practical Introduction to Image and Speech Processing in C++*(2nd ed.). San Francisco: Morgan Kaufmann Publishers. ISBN 978-3-528-15558-2.

- Schuermann, Juergen (1996).
*Pattern Classification: A Unified View of Statistical and Neural Approaches*. New York: Wiley. ISBN 978-0-471-13534-0.

- Godfried T. Toussaint, ed. (1988).
*Computational Morphology*. Amsterdam: North-Holland Publishing Company. ISBN 9781483296722. https://books.google.com/books?id=ObOjBQAAQBAJ&printsec=frontcover#v=onepage&q&f=false.

- Kulikowski, Casimir A.; Weiss, Sholom M. (1991).
*Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems*. Machine Learning. San Francisco: Morgan Kaufmann Publishers. ISBN 978-1-55860-065-2.

- Duda, Richard O.; Hart, Peter E.; Stork, David G. (2000).
*Pattern Classification*(2nd ed.). Wiley-Interscience. ISBN 978-0471056690. https://books.google.com/books?id=Br33IRC3PkQC.

- Jain, Anil.K.; Duin, Robert.P.W.; Mao, Jianchang (2000). "Statistical pattern recognition: a review".
*IEEE Transactions on Pattern Analysis and Machine Intelligence*.**22**(1): 4–37. CiteSeerX 10.1.1.123.8151. doi:10.1109/34.824819.

## External links

## External links

外部链接

- Pattern Recognition (Journal of the Pattern Recognition Society)

- Open Pattern Recognition Project, intended to be an open source platform for sharing algorithms of pattern recognition

- Improved Fast Pattern Matching Improved Fast Pattern Matching

Category:Machine learning

分类: 机器学习

Category:Formal sciences

类别: 正规科学

Category:Pattern recognition

类别: 模式识别

Category:Computational fields of study

类别: 研究的计算领域

This page was moved from wikipedia:en:Pattern recognition. Its edit history can be viewed at 模式识别/edithistory

- 有参考文献错误的页面
- 调用重复模板参数的页面
- CS1 maint: multiple names: authors list
- Pages with citations using unnamed parameters
- CS1 errors: invisible characters
- Webarchive模板archiveis链接
- CS1 maint: archived copy as title
- CS1 English-language sources (en)
- CS1 errors: missing periodical
- 含有受损文件链接的页面
- Machine learning
- Formal sciences
- Pattern recognition
- Computational fields of study
- 待整理页面