此词条暂由彩云小译翻译，翻译字数共1856，未经人工整理和审校，带来阅读不便，请见谅。

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent,模板:Efn but three major types can be distinguished, following 脚本错误：没有“Footnotes”这个模块。:

A generative model is a statistical model of the joint probability distribution [math]\displaystyle{ P(X, Y) }[/math] on given observable variable X and target variable Y;^[1]
A discriminative model is a model of the conditional probability [math]\displaystyle{ P(Y\mid X = x) }[/math] of the target Y, given an observation x; and
Classifiers computed without using a probability model are also referred to loosely as "discriminative".

The distinction between these last two classes is not consistently made;^[2] 脚本错误：没有“Footnotes”这个模块。 refers to these three classes as generative learning, conditional learning, and discriminative learning, but 脚本错误：没有“Footnotes”这个模块。 only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes.^[3] Analogously, a classifier based on a generative model is a generative classifier, while a classifier based on a discriminative model is a discriminative classifier, though this term also refers to classifiers that are not based on a model.

In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsistent, but three major types can be distinguished, following :

A generative model is a statistical model of the joint probability distribution P(X, Y) on given observable variable X and target variable Y;: "Generative classifiers learn a model of the joint probability, p(x, y), of the inputs x and the label y, and make their predictions by using Bayes rules to calculate p(y\mid x), and then picking the most likely label y.
A discriminative model is a model of the conditional probability P(Y\mid X = x) of the target Y, given an observation x; and
Classifiers computed without using a probability model are also referred to loosely as "discriminative".

The distinction between these last two classes is not consistently made;: "This distinction between conditional learning and discriminative learning is not currently a well established convention in the field." refers to these three classes as generative learning, conditional learning, and discriminative learning, but only distinguish two classes, calling them generative classifiers (joint distribution) and discriminative classifiers (conditional distribution or no distribution), not distinguishing between the latter two classes.: "Discriminative classifiers model the posterior p(y|x) directly, or learn a direct map from inputs x to the class labels." Analogously, a classifier based on a generative model is a generative classifier, while a classifier based on a discriminative model is a discriminative classifier, though this term also refers to classifiers that are not based on a model.

在分类中，两种主要的方法被称为生成方法和区分方法。这些分类器通过不同的方法进行计算，在统计建模的程度上有所不同。术语不一致，但可以区分三种主要类型: # 生成模型是给定的可观察变量 x 和目标变量 Y 的联合分布 P (X，Y)的统计模型;。# 一个判别模型是目标 Y 的条件概率 P (Y mid x = x)的模型，给定一个观察值 x; # 不使用概率模型计算的分类器也被宽泛地称为“判别”。最后两种学习方式之间的区别并不一致: “条件学习和区分性学习之间的区别目前在这个领域还不是一个成熟的惯例。”将这三类分别称为生成学习、条件学习和歧视学习，但只区分两类，称其为生成分类器(联合分布)和歧视分类器(条件分布或无分布) ，而不区分后两类。判别分类器直接对后面的 p (y | x)进行建模，或者从输入 x 中学习到类标签的直接映射类似地，基于生成模型的分类器是生成分类器，而基于判别模型的分类器是鉴别分类器，尽管这个术语也指不基于模型的分类器。

Standard examples of each, all of which are linear classifiers, are:

每个分类器的标准例子都是线性分类器:

generative classifiers:
- naive Bayes classifier and
- linear discriminant analysis
discriminative model:
- logistic regression

generative classifiers:
- naive Bayes classifier and
- linear discriminant analysis
discriminative model:
- logistic regression

生成分类器:
朴素贝叶斯分类器和
线性判别分析
判别模型:
Logit模型

In application to classification, one wishes to go from an observation x to a label y (or probability distribution on labels). One can compute this directly, without using a probability distribution (distribution-free classifier); one can estimate the probability of a label given an observation, [math]\displaystyle{ P(Y|X=x) }[/math] (discriminative model), and base classification on that; or one can estimate the joint distribution [math]\displaystyle{ P(X, Y) }[/math] (generative model), from that compute the conditional probability [math]\displaystyle{ P(Y|X=x) }[/math], and then base classification on that. These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied. In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.

In application to classification, one wishes to go from an observation x to a label y (or probability distribution on labels). One can compute this directly, without using a probability distribution (distribution-free classifier); one can estimate the probability of a label given an observation, P(Y|X=x) (discriminative model), and base classification on that; or one can estimate the joint distribution P(X, Y) (generative model), from that compute the conditional probability P(Y|X=x), and then base classification on that. These are increasingly indirect, but increasingly probabilistic, allowing more domain knowledge and probability theory to be applied. In practice different approaches are used, depending on the particular problem, and hybrids can combine strengths of multiple approaches.

在应用于分类时，我们希望从观察 x 变为标签 y (或标签上的概率分布)。人们可以直接计算这个数据，而不需要使用概率分布(无分布分类器) ; 人们可以根据观察结果估计标签的概率，P (Y | X = x)(判别模型) ，并以此为基础进行分类; 或者人们可以根据计算出的生成模型，估计联合分布 P (X，Y)(条件概率) ，然后以此为基础进行分类。这些方法越来越间接，但概率越来越大，因此可以应用更多的领域知识和概率论。在实践中，根据具体问题，可以使用不同的方法，混合方法可以结合多种方法的优点。

Definition

An alternative division defines these symmetrically as:

= = 定义 = = 另一种划分方法将这些对称定义为:

a generative model is a model of the conditional probability of the observable X, given a target y, symbolically, [math]\displaystyle{ P(X\mid Y = y) }[/math]^[4]
a discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, [math]\displaystyle{ P(Y\mid X = x) }[/math]^[5]

a generative model is a model of the conditional probability of the observable X, given a target y, symbolically, P(X\mid Y = y): "We can use Bayes rule as the basis for designing learning algorithms (function approximators), as follows: Given that we wish to learn some target function f\colon X \to Y, or equivalently, P(Y\mid X), we use the training data to learn estimates of P(X\mid Y) and P(Y). New X examples can then be classified using these estimated probability distributions, plus Bayes rule. This type of classifier is called a generative classifier, because we can view the distribution P(X\mid Y) as describing how to generate random instances X conditioned on the target attribute Y.
a discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, P(Y\mid X = x): "Logistic Regression is a function approximation algorithm that uses training data to directly estimate P(Y\mid X), in contrast to Naive Bayes. In this sense, Logistic Regression is often referred to as a discriminative classifier because we can view the distribution P(Y\mid X) as directly discriminating the value of the target value Y for any given instance X

一个生成模型是可观察 X 的条件概率模型，给定一个目标 y，象征性地，P (X mid y = y) : “我们可以使用贝叶斯规则作为设计学习算法(函数近似)的基础，如下: 假设我们希望学习一些目标函数 f 冒号 X 到 Y，或者相当于，P (Y mid x) ，我们使用训练数据来学习对 P (X mid y)和 P (Y)的估计。然后可以使用这些估计的概率分布和贝叶斯规则对新的 X 例子进行分类。这种类型的分类器被称为生成分类器，因为我们可以看到分布 P (X 中的 Y)描述了如何生成随机实例 X 条件的目标属性 Y。
判别模型是目标 Y 的条件概率模型，给出一个观察 x，象征性地，P (Y mid x = x) : “ Logit模型是一个函数近似演算法，使用训练数据直接估计 P (Y mid x) ，与朴素贝叶斯不同。从这个意义上来说，Logit模型通常被称为判别分类器，因为我们可以把分布 p (Y mid x)看作是对任何给定实例 x 的目标值 y 的直接判别

Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (outcomes), either of an observation and target [math]\displaystyle{ (x, y) }[/math], or of an observation x given a target value y,^[4] while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variable Y, given an observation x.^[5] The difference between "discriminate" (distinguish) and "classify" is subtle, and these are not consistently distinguished. (The term "discriminative classifier" becomes a pleonasm when "discrimination" is equivalent to "classification".)

Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances (outcomes), either of an observation and target (x, y), or of an observation x given a target value y, while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variable Y, given an observation x. The difference between "discriminate" (distinguish) and "classify" is subtle, and these are not consistently distinguished. (The term "discriminative classifier" becomes a pleonasm when "discrimination" is equivalent to "classification".)

无论精确定义如何，这个术语都是合乎宪法的，因为一个生成模型可以用来“生成”随机实例(结果) ，无论是观察和目标(x，y) ，还是给定目标值 y 的观察 x，而一个判别模型或区分分类器(没有模型)可以用来“区分”目标变量 Y 的值，给定一个观察 x。“区别”和“分类”之间的区别是微妙的，而且这些区别并不一致。(当“歧视”等同于“分类”时，“歧视性分类器”这个术语就变成了一个充满了歧视的词汇。)

The term "generative model" is also used to describe models that generate instances of output variables in a way that has no clear relationship to probability distributions over potential samples of input variables. Generative adversarial networks are examples of this class of generative models, and are judged primarily by the similarity of particular outputs to potential inputs. Such models are not classifiers.

术语“生成模型”也被用来描述模型，这些模型生成输出变量的实例的方式与输入变量的潜在样本的概率分布没有明确的关系。生成对抗网络就是这类生成模型的例子，主要根据特定输出与潜在输入的相似性来判断。这样的模型不是分类器。

Relationships between models

In application to classification, the observable X is frequently a continuous variable, the target Y is generally a discrete variable consisting of a finite set of labels, and the conditional probability [math]\displaystyle{ P(Y\mid X) }[/math] can also be interpreted as a (non-deterministic) target function [math]\displaystyle{ f\colon X \to Y }[/math], considering X as inputs and Y as outputs.

In application to classification, the observable X is frequently a continuous variable, the target Y is generally a discrete variable consisting of a finite set of labels, and the conditional probability P(Y\mid X) can also be interpreted as a (non-deterministic) target function f\colon X \to Y, considering X as inputs and Y as outputs.

= = = 模型之间的关系 = = = = 在分类应用中，可观察到的 X 通常是一个连续变量，目标 Y 通常是一个离散变量，由一组有限的标签组成，条件概率 P (Y mid X)也可以解释为一个(非确定性)目标函数 f 结肠 X 到 Y，考虑到 X 作为输入，Y 作为输出。

Given a finite set of labels, the two definitions of "generative model" are closely related. A model of the conditional distribution [math]\displaystyle{ P(X\mid Y = y) }[/math] is a model of the distribution of each label, and a model of the joint distribution is equivalent to a model of the distribution of label values [math]\displaystyle{ P(Y) }[/math], together with the distribution of observations given a label, [math]\displaystyle{ P(X\mid Y) }[/math]; symbolically, [math]\displaystyle{ P(X, Y) = P(X\mid Y)P(Y). }[/math] Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.

Given a finite set of labels, the two definitions of "generative model" are closely related. A model of the conditional distribution P(X\mid Y = y) is a model of the distribution of each label, and a model of the joint distribution is equivalent to a model of the distribution of label values P(Y), together with the distribution of observations given a label, P(X\mid Y); symbolically, P(X, Y) = P(X\mid Y)P(Y). Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.

给定一组有限的标签，“生成模型”的两个定义是密切相关的。条件分布模型 P (X mid Y = y)是每个标签的分布模型，联合分布模型等价于标签值 P (Y)的分布模型，连同给定标签的观测值的分布，P (X mid Y) ; 符号上，P (X，Y) = P (X mid Y) P (Y)。因此，尽管联合分布模型比标签分布模型(但没有相对频率)更能提供信息，但这只是一个相对较小的步骤，因此它们并不总是能够区分开来。

Given a model of the joint distribution, [math]\displaystyle{ P(X, Y) }[/math], the distribution of the individual variables can be computed as the marginal distributions [math]\displaystyle{ P(X) = \sum_y P(X , Y = y) }[/math] and [math]\displaystyle{ P(Y) = \int_x P(Y, X = x) }[/math] (considering X as continuous, hence integrating over it, and Y as discrete, hence summing over it), and either conditional distribution can be computed from the definition of conditional probability: [math]\displaystyle{ P(X\mid Y)=P(X, Y)/P(Y) }[/math] and [math]\displaystyle{ P(Y\mid X)=P(X, Y)/P(X) }[/math].

Given a model of the joint distribution, P(X, Y), the distribution of the individual variables can be computed as the marginal distributions P(X) = \sum_y P(X , Y = y) and P(Y) = \int_x P(Y, X = x) (considering X as continuous, hence integrating over it, and Y as discrete, hence summing over it), and either conditional distribution can be computed from the definition of conditional probability: P(X\mid Y)=P(X, Y)/P(Y) and P(Y\mid X)=P(X, Y)/P(X).

给定一个联合分布模型 P (X，Y) ，个体变量的分布可以计算为边际分布 P (X) = sum _ y P (X，Y = y)和 P (Y) = int _ x P (Y，X = x)(考虑 X 是连续的，因此在它上面积分，而 Y 是离散的，因此在它上面求和) ，任一条件分布都可以从条件概率的定义中计算出来: P (X mid Y) = P (X，Y)/P (Y)和 P (Y mid X) = P (X，Y)/P (X)。

Given a model of one conditional probability, and estimated probability distributions for the variables X and Y, denoted [math]\displaystyle{ P(X) }[/math] and [math]\displaystyle{ P(Y) }[/math], one can estimate the opposite conditional probability using Bayes' rule:

[math]\displaystyle{ P(X\mid Y)P(Y) = P(Y\mid X)P(X). }[/math]

For example, given a generative model for [math]\displaystyle{ P(X\mid Y) }[/math], one can estimate:

[math]\displaystyle{ P(Y\mid X) = P(X\mid Y)P(Y)/P(X), }[/math]

and given a discriminative model for [math]\displaystyle{ P(Y\mid X) }[/math], one can estimate:

[math]\displaystyle{ P(X\mid Y) = P(Y\mid X)P(X)/P(Y). }[/math]

Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.

Given a model of one conditional probability, and estimated probability distributions for the variables X and Y, denoted P(X) and P(Y), one can estimate the opposite conditional probability using Bayes' rule:

P(X\mid Y)P(Y) = P(Y\mid X)P(X).

For example, given a generative model for P(X\mid Y), one can estimate:

P(Y\mid X) = P(X\mid Y)P(Y)/P(X),

and given a discriminative model for P(Y\mid X), one can estimate:

P(X\mid Y) = P(Y\mid X)P(X)/P(Y).

Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.

给定一个条件概率的模型，以及变量 x 和 Y 的估计概率分布(表示为 P (X)和 P (Y)) ，我们可以使用贝叶斯规则估计相反的条件概率: : P (X mid Y) P (Y) = P (Y mid X) P (X)。例如，给定一个 P (X 中期 Y)的生成模型，人们可以估计: : P (Y 中期 X) = P (X 中期 Y) P (Y)/P (X) ，给定一个 P (Y 中期 X)的判别模型，人们可以估计: : P (X 中期 Y) = P (Y 中期 X) P (X)/P (Y)。请注意，贝叶斯规则(用一个条件概率计算另一个)和条件概率的定义(用联合分布计算条件概率)也经常混为一谈。

Contrast with discriminative classifiers

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal. So, discriminative algorithms try to learn [math]\displaystyle{ p(y|x) }[/math] directly from the data and then try to classify data. On the other hand, generative algorithms try to learn [math]\displaystyle{ p(x,y) }[/math] which can be transformed into [math]\displaystyle{ p(y|x) }[/math] later to classify the data. One of the advantages of generative algorithms is that you can use [math]\displaystyle{ p(x,y) }[/math] to generate new data similar to existing data. On the other hand, it has been proved that some discriminative algorithms give better performance than some generative algorithms in classification tasks.^[6]

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal. So, discriminative algorithms try to learn p(y|x) directly from the data and then try to classify data. On the other hand, generative algorithms try to learn p(x,y) which can be transformed into p(y|x) later to classify the data. One of the advantages of generative algorithms is that you can use p(x,y) to generate new data similar to existing data. On the other hand, it has been proved that some discriminative algorithms give better performance than some generative algorithms in classification tasks.

= = 与判别分类器对比 = = 一个生成算法模型数据是如何生成的，以便对信号进行分类。它提出了一个问题: 根据我的生成假设，哪个类别最有可能产生这种信号？判别算法不关心数据是如何生成的，它只是对给定的信号进行分类。因此，判别算法尝试从数据中直接学习 p (y | x) ，然后尝试对数据进行分类。另一方面，生成算法尝试学习 p (x，y) ，这可以转换成 p (y | x)后来分类的数据。生成算法的一个优点是可以使用 p (x，y)生成与现有数据类似的新数据。另一方面，已经证明，在分类任务中，一些判别算法比一些生成算法具有更好的性能。

Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables. But in general, they don't necessarily perform better than generative models at classification and regression tasks. The two classes are seen as complementary or as different views of the same procedure.^[7]

Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express complex relationships between the observed and target variables. But in general, they don't necessarily perform better than generative models at classification and regression tasks. The two classes are seen as complementary or as different views of the same procedure.

尽管区分模型不需要模拟观测变量的分布，但它们一般不能表达观测变量和目标变量之间的复杂关系。但一般来说，在分类和回归任务中，它们不一定比生成模型表现得更好。这两个类别被视为相辅相成或同一程序的不同观点。

Deep generative models

With the rise of deep learning, a new family of methods, called deep generative models (DGMs),^[8]^[9] is formed through the combination of generative models and deep neural networks. An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.^[10]

With the rise of deep learning, a new family of methods, called deep generative models (DGMs), is formed through the combination of generative models and deep neural networks. An increase in the scale of the neural networks is typically accompanied by an increase in the scale of the training data, both of which are required for good performance.

随着深度学习的兴起，生成模型和深度神经网络相结合形成了一系列新的方法，称为深度生成模型(DGM)。神经网络规模的增加通常伴随着训练数据规模的增加，这两者都是良好性能所必需的。

Popular DGMs include variational autoencoders (VAEs), generative adversarial networks (GANs), and auto-regressive models. Recently, there has been a trend to build very large deep generative models.^[8] For example, GPT-3, and its precursor GPT-2,^[11] are auto-regressive neural language models that contain billions of parameters, BigGAN^[12] and VQ-VAE^[13] which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.^[14]

Popular DGMs include variational autoencoders (VAEs), generative adversarial networks (GANs), and auto-regressive models. Recently, there has been a trend to build very large deep generative models. For example, GPT-3, and its precursor GPT-2, are auto-regressive neural language models that contain billions of parameters, BigGAN and VQ-VAE which are used for image generation that can have hundreds of millions of parameters, and Jukebox is a very large generative model for musical audio that contains billions of parameters.

流行的 DGM 包括变分自动编码器(VAE)、生成对抗网络(GAN)和自回归模型。近年来，出现了建立大型深层生成模型的趋势。例如，GPT-3及其前身 GPT-2是自回归神经语言模型，包含数十亿个参数，BigGAN 和 VQ-VAE 用于图像生成，可以包含数亿个参数，而 Jukebox 是一个非常大的音乐音频生成模型，包含数十亿个参数。

Types

= 类型 =

Generative models

= 生成模型 =

Types of generative models are:

生成模式的类型包括:

Gaussian mixture model (and other types of mixture model)
Hidden Markov model
Probabilistic context-free grammar
Bayesian network (e.g. Naive bayes, Autoregressive model)
Averaged one-dependence estimators
Latent Dirichlet allocation
Boltzmann machine (e.g. Restricted Boltzmann machine, Deep belief network)
Variational autoencoder
Generative adversarial network
Flow-based generative model
Energy based model

高斯混合模型(及其他类型的混合模型)
隐马尔可夫模型
概率上下文无关文法
贝氏网路(例如:。朴素贝叶斯，自回归模型)
平均单依赖估计
隐含狄利克雷分布
波茨曼机(例如:。受限玻尔兹曼机，深度信念网络)
变分自动编码器
生成对抗性网络
基于流的生成模型
基于能量的模型

If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method. However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see below), although application-specific details will ultimately dictate which approach is most suitable in any particular case.

If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method. However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see below), although application-specific details will ultimately dictate which approach is most suitable in any particular case.

如果观察到的数据是真正从生成模型中取样的，那么通过拟合生成模型的参数来最大化数据的可能性是一种常用的方法。然而，由于大多数统计模型只是对真实分布的近似，如果该模型的应用是推断一个子集的变量条件的其他已知值，那么可以认为，近似作出更多的假设，以解决手头的问题是必要的。在这种情况下，直接使用判别模型来建模条件密度函数可能更加精确(见下文) ，尽管特定于应用程序的细节将最终决定在任何特定情况下哪种方法最适合。

Discriminative models

= 歧视性模型 =

k-nearest neighbors algorithm
Logistic regression
Support Vector Machines
Decision Tree Learning
Random Forest
Maximum-entropy Markov models
Conditional random fields

k 最近邻算法
Logit模型
支持向量机
决策树学习
随机森林
最大熵马尔可夫模型
条件随机场

Examples

Simple example

Suppose the input data is [math]\displaystyle{ x \in \{1, 2\} }[/math], the set of labels for [math]\displaystyle{ x }[/math] is [math]\displaystyle{ y \in \{0, 1\} }[/math], and there are the following 4 data points: [math]\displaystyle{ (x,y) = \{(1,0), (1,1), (2,0), (2,0)\} }[/math]

Suppose the input data is x \in \{1, 2\}, the set of labels for x is y \in \{0, 1\}, and there are the following 4 data points: (x,y) = \{(1,0), (1,1), (2,0), (2,0)\}

简单的例子 = = = = = = 假设输入数据是{1,2}中的 x，x 的标签集是{0,1}中的 y，并且有以下4个数据点: (x，y) = {(1,0) ，(1,1) ，(2,0) ，(2,0)}

For the above data, estimating the joint probability distribution [math]\displaystyle{ p(x,y) }[/math] from the empirical measure will be the following:

	[math]\displaystyle{ y=0 }[/math]	[math]\displaystyle{ y=1 }[/math]
[math]\displaystyle{ x=1 }[/math]	[math]\displaystyle{ 1/4 }[/math]	[math]\displaystyle{ 1/4 }[/math]
[math]\displaystyle{ x=2 }[/math]	[math]\displaystyle{ 2/4 }[/math]	[math]\displaystyle{ 0 }[/math]

For the above data, estimating the joint probability distribution p(x,y) from the empirical measure will be the following:

	y=0	! y=1
x=1	1/4	1/4
x=2	2/4	0

对于上述数据，从经验测量中估算出的联合分布 p (x，y)如下: { | class = “ wikitable”|-！!!y=0 !!y=1

|- | x=1

|| 1/4
||1/4

|- | x=2

|| 2/4
|| 0

|}

while [math]\displaystyle{ p(y|x) }[/math] will be following:

	[math]\displaystyle{ y=0 }[/math]	[math]\displaystyle{ y=1 }[/math]
[math]\displaystyle{ x=1 }[/math]	[math]\displaystyle{ 1/2 while p(y\|x) will be following: {\| class="wikitable" \|- ! !! y=0 !! y=1 \|- \| x=1 \| \lt math\gt 1/2 而 p (y \| x)如下: { \| class = “ wikitable”\|-！!!y=0 !!y=1 \|- \| x=1 \| \lt math\gt 1/2 }[/math]	[math]\displaystyle{ 1/2 }[/math]
[math]\displaystyle{ x=2 }[/math]	[math]\displaystyle{ 1 }[/math]	[math]\displaystyle{ 0 }[/math]

</math> || 1/2

|- | x=2

|| 1
|| 0

|}

</math> || 1/2

|- | x=2

|| 1
|| 0

|}

Text generation

脚本错误：没有“Footnotes”这个模块。 gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.

gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.

文本生成 = = 给出了一个例子，在这个例子中，一个英语单词对的频率表被用来生成一个以“表示并且迅速是一个好的”开头的句子; 这不是正统的英语，但是随着表从单词对移动到单词三联体等等，它将越来越接近它。

= 另见 =

判别模型
图形模型

Notes

模板:Notelist

References

↑ 脚本错误：没有“Footnotes”这个模块。: "Generative classifiers learn a model of the joint probability, [math]\displaystyle{ p(x, y) }[/math], of the inputs x and the label y, and make their predictions by using Bayes rules to calculate [math]\displaystyle{ p(y\mid x) }[/math], and then picking the most likely label y.
↑ 脚本错误：没有“Footnotes”这个模块。: "This distinction between conditional learning and discriminative learning is not currently a well established convention in the field."
↑ 脚本错误：没有“Footnotes”这个模块。: "Discriminative classifiers model the posterior [math]\displaystyle{ p(y|x) }[/math] directly, or learn a direct map from inputs x to the class labels."
↑ ^4.0 ^4.1 脚本错误：没有“Footnotes”这个模块。: "We can use Bayes rule as the basis for designing learning algorithms (function approximators), as follows: Given that we wish to learn some target function [math]\displaystyle{ f\colon X \to Y }[/math], or equivalently, [math]\displaystyle{ P(Y\mid X) }[/math], we use the training data to learn estimates of [math]\displaystyle{ P(X\mid Y) }[/math] and [math]\displaystyle{ P(Y) }[/math]. New X examples can then be classified using these estimated probability distributions, plus Bayes rule. This type of classifier is called a generative classifier, because we can view the distribution [math]\displaystyle{ P(X\mid Y) }[/math] as describing how to generate random instances X conditioned on the target attribute Y.
↑ ^5.0 ^5.1 脚本错误：没有“Footnotes”这个模块。: "Logistic Regression is a function approximation algorithm that uses training data to directly estimate [math]\displaystyle{ P(Y\mid X) }[/math], in contrast to Naive Bayes. In this sense, Logistic Regression is often referred to as a discriminative classifier because we can view the distribution [math]\displaystyle{ P(Y\mid X) }[/math] as directly discriminating the value of the target value Y for any given instance X
↑ 脚本错误：没有“Footnotes”这个模块。
↑ Bishop, C. M.; Lasserre, J. (24 September 2007), "Generative or Discriminative? getting the best of both worlds", in Bernardo, J. M. (ed.), [[[:模板:Google books]] Bayesian statistics 8: proceedings of the eighth Valencia International Meeting, June 2-6, 2006], Oxford University Press, pp. 3–23, ISBN 978-0-19-921465-5 {{citation}}: Check |url= value (help)
↑ ^8.0 ^8.1 "Scaling up—researchers advance large-scale deep generative models". April 9, 2020.
↑ "Generative Models". OpenAI. June 16, 2016.
↑ Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). "Scaling Laws for Neural Language Models". arXiv:2001.08361 [stat.ML].
↑ "Better Language Models and Their Implications". OpenAI. February 14, 2019.
↑ Brock, Andrew; Donahue, Jeff; Simonyan, Karen (2018). "Large Scale GAN Training for High Fidelity Natural Image Synthesis". arXiv:1809.11096 [cs.LG].
↑ Razavi, Ali; van den Oord, Aaron; Vinyals, Oriol (2019). "Generating Diverse High-Fidelity Images with VQ-VAE-2". arXiv:1906.00446 [cs.LG].
↑ "Jukebox". OpenAI. April 30, 2020.

External links

Shannon, C. E. (1948). "A Mathematical Theory of Communication" (PDF). Bell System Technical Journal. 27 (July, October): 379–423, 623–656. doi:10.1002/j.1538-7305.1948.tb01338.x. hdl:10338.dmlcz/101429.
Mitchell, Tom M. (2015). "3. Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression". Machine Learning. https://www.cs.cmu.edu/~tom/mlbook/NBayesLogReg.pdf.
Ng, Andrew Y.; Jordan, Michael I. (2002). "On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes" (PDF). Advances in Neural Information Processing Systems.
Jebara, Tony (2004). Machine Learning: Discriminative and Generative. The Springer International Series in Engineering and Computer Science. Kluwer Academic (Springer). ISBN 978-1-4020-7647-3. https://www.springer.com/us/book/9781402076473.
Jebara, Tony (2002). Discriminative, generative, and imitative learning (PhD). Massachusetts Institute of Technology. hdl:1721.1/8323., (mirror, mirror), published as book (above)

, (mirror, mirror), published as book (above)

= 外部链接 =

，(镜子，镜子) ，作为书籍出版(上面)

模板:Statistics

Category:Machine learning Category:Statistical models Category:Probabilistic models

类别: 机器学习类别: 统计模型类别: 概率模型

This page was moved from wikipedia:en:Generative model. Its edit history can be viewed at 生成模型/edithistory

[ngjordan2002generative-1] 脚本错误：没有“Footnotes”这个模块。: "Generative classifiers learn a model of the joint probability, [math]\displaystyle{ p(x, y) }[/math], of the inputs x and the label y, and make their predictions by using Bayes rules to calculate [math]\displaystyle{ p(y\mid x) }[/math], and then picking the most likely label y.

[2] 脚本错误：没有“Footnotes”这个模块。: "This distinction between conditional learning and discriminative learning is not currently a well established convention in the field."

[3] 脚本错误：没有“Footnotes”这个模块。: "Discriminative classifiers model the posterior [math]\displaystyle{ p(y|x) }[/math] directly, or learn a direct map from inputs x to the class labels."

[mitchell2015generative-4] 4.0 ^4.1 脚本错误：没有“Footnotes”这个模块。: "We can use Bayes rule as the basis for designing learning algorithms (function approximators), as follows: Given that we wish to learn some target function [math]\displaystyle{ f\colon X \to Y }[/math], or equivalently, [math]\displaystyle{ P(Y\mid X) }[/math], we use the training data to learn estimates of [math]\displaystyle{ P(X\mid Y) }[/math] and [math]\displaystyle{ P(Y) }[/math]. New X examples can then be classified using these estimated probability distributions, plus Bayes rule. This type of classifier is called a generative classifier, because we can view the distribution [math]\displaystyle{ P(X\mid Y) }[/math] as describing how to generate random instances X conditioned on the target attribute Y.

[mitchell2015discriminative-5] 5.0 ^5.1 脚本错误：没有“Footnotes”这个模块。: "Logistic Regression is a function approximation algorithm that uses training data to directly estimate [math]\displaystyle{ P(Y\mid X) }[/math], in contrast to Naive Bayes. In this sense, Logistic Regression is often referred to as a discriminative classifier because we can view the distribution [math]\displaystyle{ P(Y\mid X) }[/math] as directly discriminating the value of the target value Y for any given instance X

[6] 脚本错误：没有“Footnotes”这个模块。

[7] Bishop, C. M.; Lasserre, J. (24 September 2007), "Generative or Discriminative? getting the best of both worlds", in Bernardo, J. M. (ed.), [[[:模板:Google books]] Bayesian statistics 8: proceedings of the eighth Valencia International Meeting, June 2-6, 2006], Oxford University Press, pp. 3–23, ISBN 978-0-19-921465-5 {{citation}}: Check |url= value (help)

[auto1-8] 8.0 ^8.1 "Scaling up—researchers advance large-scale deep generative models". April 9, 2020.

[auto-9] "Generative Models". OpenAI. June 16, 2016.

[10] Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). "Scaling Laws for Neural Language Models". arXiv:2001.08361 [stat.ML].

[11] "Better Language Models and Their Implications". OpenAI. February 14, 2019.

[12] Brock, Andrew; Donahue, Jeff; Simonyan, Karen (2018). "Large Scale GAN Training for High Fidelity Natural Image Synthesis". arXiv:1809.11096 [cs.LG].

[13] Razavi, Ali; van den Oord, Aaron; Vinyals, Oriol (2019). "Generating Diverse High-Fidelity Images with VQ-VAE-2". arXiv:1906.00446 [cs.LG].

[14] "Jukebox". OpenAI. April 30, 2020.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

生成模型

目录

Definition

Relationships between models

Contrast with discriminative classifiers

Deep generative models

Types

Types

= 类型 =

Generative models

Generative models

= 生成模型 =

Discriminative models

Discriminative models

= 歧视性模型 =

Examples

Simple example

Text generation

See also

= 另见 =

Notes

References

External links

= 外部链接 =

导航菜单

搜索