更改

添加1,564字节 、 2021年12月22日 (三) 20:45
无编辑摘要
第281行: 第281行:     
== 例子 ==
 
== 例子 ==
'''<big>Guassian mixture</big>'''
+
'''<big>高斯混合</big>'''
    
Let {\displaystyle \mathbf {x} =(\mathbf {x} _{1},\mathbf {x} _{2},\ldots ,\mathbf {x} _{n})}\mathbf{x} = (\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n) be a sample of {\displaystyle n}n independent observations from a mixture of two multivariate normal distributions of dimension {\displaystyle d}d, and let {\displaystyle \mathbf {z} =(z_{1},z_{2},\ldots ,z_{n})}\mathbf{z} = (z_1,z_2,\ldots,z_n) be the latent variables that determine the component from which the observation originates.[16]
 
Let {\displaystyle \mathbf {x} =(\mathbf {x} _{1},\mathbf {x} _{2},\ldots ,\mathbf {x} _{n})}\mathbf{x} = (\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_n) be a sample of {\displaystyle n}n independent observations from a mixture of two multivariate normal distributions of dimension {\displaystyle d}d, and let {\displaystyle \mathbf {z} =(z_{1},z_{2},\ldots ,z_{n})}\mathbf{z} = (z_1,z_2,\ldots,z_n) be the latent variables that determine the component from which the observation originates.[16]
第312行: 第312行:       −
'''<small>E step</small>'''
+
 
 +
'''E步'''
    
Given our current estimate of the parameters θ(t), the conditional distribution of the Zi is determined by Bayes theorem to be the proportional height of the normal density weighted by τ:
 
Given our current estimate of the parameters θ(t), the conditional distribution of the Zi is determined by Bayes theorem to be the proportional height of the normal density weighted by τ:
第328行: 第329行:  
This full conditional expectation does not need to be calculated in one step, because τ and μ/Σ appear in separate linear terms and can thus be maximized independently.
 
This full conditional expectation does not need to be calculated in one step, because τ and μ/Σ appear in separate linear terms and can thus be maximized independently.
   −
'''M step'''  
+
'''M步'''  
    
Q(θ | θ(t)) being quadratic in form means that determining the maximizing values of θ is relatively straightforward. Also, τ, (μ1,Σ1) and (μ2,Σ2) may all be maximized independently since they all appear in separate linear terms.
 
Q(θ | θ(t)) being quadratic in form means that determining the maximizing values of θ is relatively straightforward. Also, τ, (μ1,Σ1) and (μ2,Σ2) may all be maximized independently since they all appear in separate linear terms.
第344行: 第345行:  
<nowiki>{\displaystyle {\boldsymbol {\mu }}_{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}\mathbf {x} _{i}}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}}\boldsymbol{\mu}_2^{(t+1)} = \frac{\sum_{i=1}^n T_{2,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{2,i}^{(t)}}  and {\displaystyle \Sigma _{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}.}{\displaystyle \Sigma _{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}.}</nowiki>
 
<nowiki>{\displaystyle {\boldsymbol {\mu }}_{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}\mathbf {x} _{i}}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}}\boldsymbol{\mu}_2^{(t+1)} = \frac{\sum_{i=1}^n T_{2,i}^{(t)} \mathbf{x}_i}{\sum_{i=1}^n T_{2,i}^{(t)}}  and {\displaystyle \Sigma _{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}.}{\displaystyle \Sigma _{2}^{(t+1)}={\frac {\sum _{i=1}^{n}T_{2,i}^{(t)}(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})(\mathbf {x} _{i}-{\boldsymbol {\mu }}_{2}^{(t+1)})^{\top }}{\sum _{i=1}^{n}T_{2,i}^{(t)}}}.}</nowiki>
   −
==== Termination ====
+
==== 终止 ====
 
Conclude the iterative process if {\displaystyle E_{Z\mid \theta ^{(t)},\mathbf {x} }[\log L(\theta ^{(t)};\mathbf {x} ,\mathbf {Z} )]\leq E_{Z\mid \theta ^{(t-1)},\mathbf {x} }[\log L(\theta ^{(t-1)};\mathbf {x} ,\mathbf {Z} )]+\varepsilon }{\displaystyle E_{Z\mid \theta ^{(t)},\mathbf {x} }[\log L(\theta ^{(t)};\mathbf {x} ,\mathbf {Z} )]\leq E_{Z\mid \theta ^{(t-1)},\mathbf {x} }[\log L(\theta ^{(t-1)};\mathbf {x} ,\mathbf {Z} )]+\varepsilon } for {\displaystyle \varepsilon }\varepsilon below some preset threshold.
 
Conclude the iterative process if {\displaystyle E_{Z\mid \theta ^{(t)},\mathbf {x} }[\log L(\theta ^{(t)};\mathbf {x} ,\mathbf {Z} )]\leq E_{Z\mid \theta ^{(t-1)},\mathbf {x} }[\log L(\theta ^{(t-1)};\mathbf {x} ,\mathbf {Z} )]+\varepsilon }{\displaystyle E_{Z\mid \theta ^{(t)},\mathbf {x} }[\log L(\theta ^{(t)};\mathbf {x} ,\mathbf {Z} )]\leq E_{Z\mid \theta ^{(t-1)},\mathbf {x} }[\log L(\theta ^{(t-1)};\mathbf {x} ,\mathbf {Z} )]+\varepsilon } for {\displaystyle \varepsilon }\varepsilon below some preset threshold.
 +
 +
'''一般化'''
 +
 +
The algorithm illustrated above can be generalized for mixtures of more than two multivariate normal distributions.
 +
 +
'''截断和删减回归'''
 +
 +
The EM algorithm has been implemented in the case where an underlying linear regression model exists explaining the variation of some quantity, but where the values actually observed are censored or truncated versions of those represented in the model. Special cases of this model include censored or truncated observations from one normal distribution.
 +
 +
== 选择 ==
 +
EM typically converges to a local optimum, not necessarily the global optimum, with no bound on the convergence rate in general. It is possible that it can be arbitrarily poor in high dimensions and there can be an exponential number of local optima. Hence, a need exists for alternative methods for guaranteed learning, especially in the high-dimensional setting. Alternatives to EM exist with better guarantees for consistency, which are termed ''moment-based approaches'' or the so-called ''spectral techniques''. Moment-based approaches to learning the parameters of a probabilistic model are of increasing interest recently since they enjoy guarantees such as global convergence under certain conditions unlike EM which is often plagued by the issue of getting stuck in local optima. Algorithms with guarantees for learning can be derived for a number of important models such as mixture models, HMMs etc. For these spectral methods, no spurious local optima occur, and the true parameters can be consistently estimated under some regularity conditions.
    
<small>This page was moved from [[wikipedia:en:Expectation–maximization algorithm]]. Its edit history can be viewed at [[EM算法/edithistory]]</small>
 
<small>This page was moved from [[wikipedia:en:Expectation–maximization algorithm]]. Its edit history can be viewed at [[EM算法/edithistory]]</small>
 
[[Category:待整理页面]]
 
[[Category:待整理页面]]
12

个编辑