更改

跳到导航 跳到搜索
删除190字节 、 2021年12月25日 (六) 19:46
无编辑摘要
第116行: 第116行:     
Expectation-maximization works to improve [math]\displaystyle{ Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] rather than directly improving [math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta) }[/math]. Here is shown that improvements to the former imply improvements to the latter.
 
Expectation-maximization works to improve [math]\displaystyle{ Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] rather than directly improving [math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta) }[/math]. Here is shown that improvements to the former imply improvements to the latter.
 +
 +
<nowiki>期望最大化可以改善 {\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})}{\displaystyle Q({\boldsymbol {\theta }} \mid {\boldsymbol {\theta }}^{(t)})} 而不是直接改进 {\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})}{\displaystyle \ log p(\mathbf {X} \mid {\boldsymbol {\theta }})}。 这里表明对前者的改进意味着对后者的改进。</nowiki>
    
For any [math]\displaystyle{ \mathbf{Z} }[/math] with non-zero probability [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta) }[/math], we can write
 
For any [math]\displaystyle{ \mathbf{Z} }[/math] with non-zero probability [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta) }[/math], we can write
   −
[math]\displaystyle{ This idea is further extended in generalized expectation maximization (GEM) algorithm, in which is sought only an increase in the objective function F for both the E step and M step as described in the As a maximization-maximization procedure section. 这一思想在广义期望最大化(GEM)算法中得到了进一步的推广,该算法只寻求 e 步和 m 步的目标函数 f 的增加,如最大化-最大化过程一节所述。 \log p(\mathbf{X}\mid\boldsymbol\theta) = \log p(\mathbf{X},\mathbf{Z}\mid\boldsymbol\theta) - \log p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta). }[/math]
+
<nowiki>对于任何具有非零概率 {\displaystyle \mathbf {Z} }\mathbf {Z} {\displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})}{ \displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})},我们可以写</nowiki>
 +
 
 +
<nowiki>{\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})=\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}).}{\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})=\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}).}</nowiki>
    
We take the expectation over possible values of the unknown data [math]\displaystyle{ \mathbf{Z} }[/math] under the current parameter estimate [math]\displaystyle{ \theta^{(t)} }[/math] by multiplying both sides by [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) }[/math] and summing (or integrating) over [math]\displaystyle{ \mathbf{Z} }[/math]. The left-hand side is the expectation of a constant, so we get:
 
We take the expectation over possible values of the unknown data [math]\displaystyle{ \mathbf{Z} }[/math] under the current parameter estimate [math]\displaystyle{ \theta^{(t)} }[/math] by multiplying both sides by [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) }[/math] and summing (or integrating) over [math]\displaystyle{ \mathbf{Z} }[/math]. The left-hand side is the expectation of a constant, so we get:
   −
<nowiki>[math]\displaystyle{ \begin{align} The Q-function used in the EM algorithm is based on the log likelihood. Therefore, it is regarded as the log-EM algorithm. The use of the log likelihood can be generalized to that of the α-log likelihood ratio. Then, the α-log likelihood ratio of the observed data can be exactly expressed as equality by using the Q-function of the α-log likelihood ratio and the α-divergence. Obtaining this Q-function is a generalized E step. Its maximization is a generalized M step. This pair is called the α-EM algorithm EM 算法中使用的 q 函数是基于对数似然的。因此,该算法被称为 log-EM 算法。对数似然的应用可以推广到 α- 对数似然比的应用。然后,利用 α 对数似然比的 q 函数和 α 散度的 q 函数,将观测数据的 α 对数似然比精确地表示为等式。获得这个 q 函数是一个广义的 e 步。它的最大化是一个广义的 m 步。这一对被称为 α-em 算法 \log p(\mathbf{X}\mid\boldsymbol\theta) & = \sum_{\mathbf{Z}} p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) \log p(\mathbf{X},\mathbf{Z}\mid\boldsymbol\theta) which contains the log-EM algorithm as its subclass. Thus, the α-EM algorithm by Yasuo Matsuyama is an exact generalization of the log-EM algorithm. No computation of gradient or Hessian matrix is needed. The α-EM shows faster convergence than the log-EM algorithm by choosing an appropriate α. The α-EM algorithm leads to a faster version of the Hidden Markov model estimation algorithm α-HMM. 它包含了 log-EM 算法作为其子类。因此,Matsuyama 提出的 α-em 算法是 log-EM 算法的精确推广。不需要计算梯度或 Hessian 矩阵。与 log-EM 算法相比,α-em 算法通过选择合适的 α,具有更快的收敛速度。算法是隐马尔可夫模型估计算法 α-hmm 的一个更快的版本。 - \sum_{\mathbf{Z}} p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) \log p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta) \\ & = Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) + H(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}), \end{align} }[/math]</nowiki>
+
<nowiki>我们在当前参数估计 {\displaystyle \theta ^{(t)}}\theta^{(t)} 下对未知数据的可能值取期望值 {\displaystyle \mathbf {Z} }\mathbf {Z} 两边乘以 {\displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})}{\displaystyle p(\mathbf {Z} \ mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})} 并在 {\displaystyle \mathbf {Z} }\mathbf {Z} 上求和(或积分)。 左边是一个常数的期望,所以我们得到:</nowiki>
   −
where [math]\displaystyle{ H(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] is defined by the negated sum it is replacing.
+
<nowiki>{\displaystyle {\begin{aligned}\log p(\mathbf {X} \mid {\boldsymbol {\theta }})&=\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})\\&=Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})+H({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)}),\end{aligned}}}{\displaystyle {\begin{aligned}\log p(\mathbf {X} \mid {\boldsymbol {\theta }})&=\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})\\&=Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})+H({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)}),\end{aligned}}}</nowiki>
   −
This last equation holds for every value of [math]\displaystyle{ \boldsymbol\theta }[/math] including [math]\displaystyle{ \boldsymbol\theta = \boldsymbol\theta^{(t)} }[/math],
+
where [math]\displaystyle{ H(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] is defined by the negated sum it is replacing. This last equation holds for every value of [math]\displaystyle{ \boldsymbol\theta }[/math] including [math]\displaystyle{ \boldsymbol\theta = \boldsymbol\theta^{(t)} }[/math],
    
[math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta^{(t)}) In information geometry, the E step and the M step are interpreted as projections under dual affine connections, called the e-connection and the m-connection; the Kullback–Leibler divergence can also be understood in these terms. 在信息几何中,e 步和 m 步被解释为双仿射联系下的投影,称为 e 联系和 m 联系; Kullback-Leibler 分歧也可以用这些术语来理解。 = Q(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}) + H(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}), }[/math]
 
[math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta^{(t)}) In information geometry, the E step and the M step are interpreted as projections under dual affine connections, called the e-connection and the m-connection; the Kullback–Leibler divergence can also be understood in these terms. 在信息几何中,e 步和 m 步被解释为双仿射联系下的投影,称为 e 联系和 m 联系; Kullback-Leibler 分歧也可以用这些术语来理解。 = Q(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}) + H(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}), }[/math]
12

个编辑

导航菜单