更改

跳到导航 跳到搜索
添加2,085字节 、 2021年12月25日 (六) 20:54
无编辑摘要
第54行: 第54行:     
<nowiki>然而,这个量通常是难以处理的,因为 {\displaystyle \mathbf {Z} }\mathbf {Z} 是不可观察的,并且 {\displaystyle \mathbf {Z} }\mathbf {Z} 的分布在达到 {\displaystyle { \boldsymbol {\theta }}}{\boldsymbol {\theta }}之前是未知的。</nowiki>
 
<nowiki>然而,这个量通常是难以处理的,因为 {\displaystyle \mathbf {Z} }\mathbf {Z} 是不可观察的,并且 {\displaystyle \mathbf {Z} }\mathbf {Z} 的分布在达到 {\displaystyle { \boldsymbol {\theta }}}{\boldsymbol {\theta }}之前是未知的。</nowiki>
 +
 
The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these two steps:
 
The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying these two steps:
   第61行: 第62行:  
:<nowiki>期望步(E步):定义 {\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})}{\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})} 作为 {\displaystyle {\boldsymbol {\theta }}}{\boldsymbol {\theta }} 的对数似然函数的期望值 ,关于 {\displaystyle \mathbf {Z} }\mathbf {Z} 的当前条件分布给定 {\displaystyle \mathbf {X} }\mathbf {X} 和参数 {\displaystyle {\ 粗体符号 {\theta }}^{(t)}}\boldsymbol\theta^{(t)}:</nowiki>
 
:<nowiki>期望步(E步):定义 {\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})}{\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})} 作为 {\displaystyle {\boldsymbol {\theta }}}{\boldsymbol {\theta }} 的对数似然函数的期望值 ,关于 {\displaystyle \mathbf {Z} }\mathbf {Z} 的当前条件分布给定 {\displaystyle \mathbf {X} }\mathbf {X} 和参数 {\displaystyle {\ 粗体符号 {\theta }}^{(t)}}\boldsymbol\theta^{(t)}:</nowiki>
 
::<math>Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) = \operatorname{E}_{\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}}\left[ \log L (\boldsymbol\theta; \mathbf{X},\mathbf{Z})  \right] \,</math>
 
::<math>Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) = \operatorname{E}_{\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}}\left[ \log L (\boldsymbol\theta; \mathbf{X},\mathbf{Z})  \right] \,</math>
 
+
::''Maximization step (M step)'': Find the parameters that maximize this quantity:
:''Maximization step (M step)'': Find the parameters that maximize this quantity:
+
::最大化步(M步):找到使数量最大化的参数:
 
  −
最大化步(M步):找到使数量最大化的参数:
  −
 
   
::<math>\boldsymbol\theta^{(t+1)} = \underset{\boldsymbol\theta}{\operatorname{arg\,max}} \ Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) \, </math>
 
::<math>\boldsymbol\theta^{(t+1)} = \underset{\boldsymbol\theta}{\operatorname{arg\,max}} \ Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) \, </math>
The typical models to which EM is applied use <math>\mathbf{Z}</math> as a latent variable indicating membership in one of a set of groups:
+
::
 
+
::The typical models to which EM is applied use <math>\mathbf{Z}</math> as a latent variable indicating membership in one of a set of groups:
#The observed data points <math>\mathbf{X}</math> may be [[discrete random variable|discrete]] (taking values in a finite or countably infinite set) or [[continuous random variable|continuous]] (taking values in an uncountably infinite set). Associated with each data point may be a vector of observations.
+
::应用 EM 的典型模型使用 [math]\displaystyle{ \mathbf{Z} }[/math] 作为潜在变量,表明属于一组组之一:
#The [[missing values]] (aka [[latent variables]]) <math>\mathbf{Z}</math> are [[discrete random variable|discrete]], drawn from a fixed number of values, and with one latent variable per observed unit.
+
::1. The observed data points <math>\mathbf{X}</math> may be [[discrete random variable|discrete]] (taking values in a finite or countably infinite set) or [[continuous random variable|continuous]] (taking values in an uncountably infinite set). Associated with each data point may be a vector of observations.
#The parameters are continuous, and are of two kinds: Parameters that are associated with all data points, and those associated with a specific value of a latent variable (i.e., associated with all data points which corresponding latent variable has that value).
+
::2.The [[missing values]] (aka [[latent variables]]) <math>\mathbf{Z}</math> are [[discrete random variable|discrete]], drawn from a fixed number of values, and with one latent variable per observed unit.
 +
::3.The parameters are continuous, and are of two kinds: Parameters that are associated with all data points, and those associated with a specific value of a latent variable (i.e., associated with all data points which corresponding latent variable has that value). 1. 观察到的数据点 [math]\displaystyle{ \mathbf{X} }[/math] 可能是离散的(取有限或可数无限集合中的值)或连续(取不可数无限集合中的值)。 与每个数据点相关联的可能是观察向量。  2. 缺失值(又名潜在变量)[math]\displaystyle{ \mathbf{Z} }[/math] 是离散的,从固定数量的值中提取,每个观察单位有一个潜在变量。  3.参数是连续的,分为两种:与所有数据点相关的参数,以及与潜在变量的特定值相关的参数(即与对应潜在变量具有该值的所有数据点相关联)。
    
However, it is possible to apply EM to other sorts of models.
 
However, it is possible to apply EM to other sorts of models.
  −
  −
3. 参数是连续的,有两种情况:和所有数据点相关联的参数,或和一个潜在变量的一个具体值相关联的参数(也就是与所有对应有该值的潜在变量的数据点相关联)。
      
然而,EM也可以应用到其他模型中。
 
然而,EM也可以应用到其他模型中。
    
The motive is as follows.  If the value of the parameters <math>\boldsymbol\theta</math> is known, usually the value of the latent variables <math>\mathbf{Z}</math> can be found by maximizing the log-likelihood over all possible values of <math>\mathbf{Z}</math>, either simply by iterating over <math>\mathbf{Z}</math> or through an algorithm such as the [[Baum–Welch algorithm]] for [[hidden Markov model]]s.  Conversely, if we know the value of the latent variables <math>\mathbf{Z}</math>, we can find an estimate of the parameters <math>\boldsymbol\theta</math> fairly easily, typically by simply grouping the observed data points according to the value of the associated latent variable and averaging the values, or some function of the values, of the points in each group.  This suggests an iterative algorithm, in the case where both <math>\boldsymbol\theta</math> and <math>\mathbf{Z}</math> are unknown:
 
The motive is as follows.  If the value of the parameters <math>\boldsymbol\theta</math> is known, usually the value of the latent variables <math>\mathbf{Z}</math> can be found by maximizing the log-likelihood over all possible values of <math>\mathbf{Z}</math>, either simply by iterating over <math>\mathbf{Z}</math> or through an algorithm such as the [[Baum–Welch algorithm]] for [[hidden Markov model]]s.  Conversely, if we know the value of the latent variables <math>\mathbf{Z}</math>, we can find an estimate of the parameters <math>\boldsymbol\theta</math> fairly easily, typically by simply grouping the observed data points according to the value of the associated latent variable and averaging the values, or some function of the values, of the points in each group.  This suggests an iterative algorithm, in the case where both <math>\boldsymbol\theta</math> and <math>\mathbf{Z}</math> are unknown:
 +
 +
原因如下。如果已知参数 [math]\displaystyle{ \boldsymbol\theta }[/math] 的值,通常可以通过最大化 [math]\displaystyle{ \mathbf{Z} }[/math] 的所有可能值的对数似然找到潜变量 [math]\displaystyle{ \mathbf{Z} }[/math] 的值,或者简单地通过迭代 [math]\displaystyle{ \mathbf{Z} }[/math] 或通过一种算法,例如用于隐马尔可夫模型的 Baum-Welch 算法。相反,如果我们知道潜在变量 [math]\displaystyle{ \mathbf{Z} }[/math] 的值,我们可以非常容易找到参数 [math]\displaystyle{ \boldsymbol\theta }[/math] 的估计,通常只需根据相关潜在变量的值对观察到的数据点进行分组,并对每组中点的值或值的某些函数求平均值。这表明了一种在 [math]\displaystyle{ \boldsymbol\theta }[/math] 和 [math]\displaystyle{ \mathbf{Z} }[/math] 都是未知情况下的迭代算法:
    
#First, initialize the parameters <math>\boldsymbol\theta</math> to some random values.
 
#First, initialize the parameters <math>\boldsymbol\theta</math> to some random values.
 
#Compute the probability of each possible value of <math>\mathbf{Z}</math> , given <math>\boldsymbol\theta</math>.
 
#Compute the probability of each possible value of <math>\mathbf{Z}</math> , given <math>\boldsymbol\theta</math>.
 
#Then, use the just-computed values of <math>\mathbf{Z}</math> to compute a better estimate for the parameters <math>\boldsymbol\theta</math>.
 
#Then, use the just-computed values of <math>\mathbf{Z}</math> to compute a better estimate for the parameters <math>\boldsymbol\theta</math>.
#Iterate steps 2 and 3 until convergence.
+
#Iterate steps 2 and 3 until convergence.                                                                                                                                                                                                                                           1.首先,将参数 [math]\displaystyle{ \boldsymbol\theta }[/math] 初始化为一些随机值。 2.计算 [math]\displaystyle{ \mathbf{Z} }[/math] 的每个可能值的概率,给定 [math]\displaystyle{ \boldsymbol\theta }[/math]。  3.然后,使用刚刚计算的 [math]\displaystyle{ \mathbf{Z} }[/math] 来计算参数 [math]\displaystyle{ \boldsymbol\theta }[/math] 的更好估计。  4.迭代步骤 2 和 3 直到收敛。
    
The algorithm as just described monotonically approaches a local minimum of the cost function.
 
The algorithm as just described monotonically approaches a local minimum of the cost function.
12

个编辑

导航菜单