更改

EM算法 (查看源代码)

2021年12月25日 (六) 19:46的版本

删除190字节、 2021年12月25日 (六) 19:46

无编辑摘要

第116行：第116行：

Expectation-maximization works to improve [math]\displaystyle{ Q(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] rather than directly improving [math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta) }[/math]. Here is shown that improvements to the former imply improvements to the latter.

+

<nowiki>期望最大化可以改善 {\displaystyle Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})}{\displaystyle Q({\boldsymbol {\theta }} \mid {\boldsymbol {\theta }}^{(t)})} 而不是直接改进 {\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})}{\displaystyle \ log p(\mathbf {X} \mid {\boldsymbol {\theta }})}。这里表明对前者的改进意味着对后者的改进。</nowiki>

For any [math]\displaystyle{ \mathbf{Z} }[/math] with non-zero probability [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta) }[/math], we can write

−

~~[math]~~\displaystyle{ ~~This idea is further extended in generalized expectation maximization~~ (~~GEM~~) ~~algorithm~~, ~~in which is sought only an increase in the objective function F for both the E step and M step as described in the As a maximization~~-~~maximization procedure section. 这一思想在广义期望最大化~~(~~GEM~~)~~算法中得到了进一步的推广，该算法只寻求 e 步和 m 步的目标函数 f 的增加，如最大化-最大化过程一节所述。~~ \log p(\mathbf{X}\mid\boldsymbol\theta) = \log p(\mathbf{X},\mathbf{Z}\mid\boldsymbol\theta) - \log p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta). }[/~~math]~~

+

<nowiki>对于任何具有非零概率 {\displaystyle \mathbf {Z} }\mathbf {Z} {\displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})}{ \displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})}，我们可以写</nowiki>

+

<nowiki>{\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})=\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}).}{\displaystyle \log p(\mathbf {X} \mid {\boldsymbol {\theta }})=\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}).}</nowiki>

We take the expectation over possible values of the unknown data [math]\displaystyle{ \mathbf{Z} }[/math] under the current parameter estimate [math]\displaystyle{ \theta^{(t)} }[/math] by multiplying both sides by [math]\displaystyle{ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) }[/math] and summing (or integrating) over [math]\displaystyle{ \mathbf{Z} }[/math]. The left-hand side is the expectation of a constant, so we get:

−

<nowiki>~~[math]~~\displaystyle{ \~~begin~~{align} The Q-function used in the EM algorithm is based on the log likelihood. Therefore, it is regarded as the log-EM algorithm. The use of the log likelihood can be generalized to that of the α-log likelihood ratio. Then, the α-log likelihood ratio of the observed data can be exactly expressed as equality by using the Q-function of the α-log likelihood ratio and the α-divergence. Obtaining this Q-function is a generalized E step. Its maximization is a generalized M step. This pair is called the α-EM algorithm EM 算法中使用的 q 函数是基于对数似然的。因此，该算法被称为 log-EM 算法。对数似然的应用可以推广到 α- 对数似然比的应用。然后，利用 α 对数似然比的 q 函数和 α 散度的 q 函数，将观测数据的 α 对数似然比精确地表示为等式。获得这个 q 函数是一个广义的 e 步。它的最大化是一个广义的 m 步。这一对被称为 α-em 算法 \log p(~~\mathbf{X}\mid\boldsymbol\theta~~) ~~& = \sum_{\mathbf{Z~~}} ~~p(\mathbf{Z}\mid\mathbf{X},\boldsymbol~~\theta^{(t)}) \~~log p(~~\mathbf{X},\mathbf{Z}\mid\boldsymbol\theta) which contains the log-EM algorithm as its subclass. Thus, the α-EM algorithm by Yasuo Matsuyama is an exact generalization of the log-EM algorithm. No computation of gradient or Hessian matrix is needed. The α-EM shows faster convergence than the log-EM algorithm by choosing an appropriate α. The α-EM algorithm leads to a faster version of the Hidden Markov model estimation algorithm α-HMM. 它包含了 log-EM 算法作为其子类。因此，Matsuyama 提出的 α-em 算法是 log-EM 算法的精确推广。不需要计算梯度或 Hessian 矩阵。与 log-EM 算法相比，α-em 算法通过选择合适的 α，具有更快的收敛速度。算法是隐马尔可夫模型估计算法 α-hmm 的一个更快的版本。 - \sum_{\~~mathbf{Z}}~~ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)}) \~~log~~ p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta~~) \\ & = Q(\boldsymbol\theta\mid\boldsymbol\theta~~^{(t)}) ~~+ H(~~\~~boldsymbol~~\~~theta\mid\boldsymbol\theta^~~{~~(t)~~}), \~~end~~{~~align}~~ }~~[/math]~~</nowiki>

+

<nowiki>我们在当前参数估计 {\displaystyle \theta ^{(t)}}\theta^{(t)} 下对未知数据的可能值取期望值 {\displaystyle \mathbf {Z} }\mathbf {Z} 两边乘以 {\displaystyle p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})}{\displaystyle p(\mathbf {Z} \ mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})} 并在 {\displaystyle \mathbf {Z} }\mathbf {Z} 上求和（或积分）。左边是一个常数的期望，所以我们得到：</nowiki>

−

~~where [math]~~\displaystyle{ H(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/~~math] is defined by the negated sum it is replacing.~~

+

<nowiki>{\displaystyle {\begin{aligned}\log p(\mathbf {X} \mid {\boldsymbol {\theta }})&=\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})\\&=Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})+H({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)}),\end{aligned}}}{\displaystyle {\begin{aligned}\log p(\mathbf {X} \mid {\boldsymbol {\theta }})&=\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})-\sum _{\mathbf {Z} }p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }}^{(t)})\log p(\mathbf {Z} \mid \mathbf {X} ,{\boldsymbol {\theta }})\\&=Q({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)})+H({\boldsymbol {\theta }}\mid {\boldsymbol {\theta }}^{(t)}),\end{aligned}}}</nowiki>

−

This last equation holds for every value of [math]\displaystyle{ \boldsymbol\theta }[/math] including [math]\displaystyle{ \boldsymbol\theta = \boldsymbol\theta^{(t)} }[/math],

+

where [math]\displaystyle{ H(\boldsymbol\theta\mid\boldsymbol\theta^{(t)}) }[/math] is defined by the negated sum it is replacing. This last equation holds for every value of [math]\displaystyle{ \boldsymbol\theta }[/math] including [math]\displaystyle{ \boldsymbol\theta = \boldsymbol\theta^{(t)} }[/math],

[math]\displaystyle{ \log p(\mathbf{X}\mid\boldsymbol\theta^{(t)}) In information geometry, the E step and the M step are interpreted as projections under dual affine connections, called the e-connection and the m-connection; the Kullback–Leibler divergence can also be understood in these terms. 在信息几何中，e 步和 m 步被解释为双仿射联系下的投影，称为 e 联系和 m 联系; Kullback-Leibler 分歧也可以用这些术语来理解。 = Q(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}) + H(\boldsymbol\theta^{(t)}\mid\boldsymbol\theta^{(t)}), }[/math]

康贺铭

12

个编辑

更改

EM算法 (查看源代码)

2021年12月25日 (六) 19:46的版本

导航菜单

搜索