更改

删除487字节 、 2021年1月20日 (三) 15:26
第67行: 第67行:       −
  −
The EM algorithm was explained and given its name in a classic 1977 paper by [[Arthur P. Dempster|Arthur Dempster]], [[Nan Laird]], and [[Donald Rubin]].<ref name="Dempster1977">
  −
  −
The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin. They pointed out that the method had been "proposed many times in special circumstances" by earlier authors. One of the earliest is the gene-counting method for estimating allele frequencies by Cedric Smith.  A very detailed treatment of the EM method for exponential families was published by Rolf Sundberg in his thesis and several papers following his collaboration with Per Martin-Löf and Anders Martin-Löf.<!-- * Martin-Löf, P. "Exact tests, confidence regions and estimates", with a discussion by A. W. F. Edwards, G. A. Barnard, D. A. Sprott, O. Barndorff-Nielsen, D. Basu and G. Rasch. Proceedings of Conference on Foundational Questions in Statistical Inference (Aarhus, 1973), pp. 121–138. Memoirs, No. 1, Dept. Theoret. Statist., Inst. Math., Univ. Aarhus, Aarhus, 1974. --> The Dempster–Laird–Rubin paper in 1977 generalized the method and sketched a convergence analysis for a wider class of problems. The Dempster–Laird–Rubin paper established the EM method as an important tool of statistical analysis.
   
Arthur Dempster, Nan Laird, and Donald Rubin于1977年发表的一篇经典的论文中解释和命名了EM算法。他们指出该方法被早期作者“多次在特殊条件下提出”。Cedric Smith提出的估计等位基因频率的基因计数法便是其中之一。基于与Per Martin-Löf和Anders Martin-Löf的合作,Rolf Sundberg在他的学位论文和若干论文中详述了针对指数族的EM方法。1977年Dempster–Laird–Rubin的论文推广了该方法并针对更为广泛的问题进行了收敛性分析。
 
Arthur Dempster, Nan Laird, and Donald Rubin于1977年发表的一篇经典的论文中解释和命名了EM算法。他们指出该方法被早期作者“多次在特殊条件下提出”。Cedric Smith提出的估计等位基因频率的基因计数法便是其中之一。基于与Per Martin-Löf和Anders Martin-Löf的合作,Rolf Sundberg在他的学位论文和若干论文中详述了针对指数族的EM方法。1977年Dempster–Laird–Rubin的论文推广了该方法并针对更为广泛的问题进行了收敛性分析。
 
Dempster–Laird–Rubin算法的收敛分析存在缺陷,C. F. Jeff Wu在1983年发表了一项修正的收敛性分析。Wu的工作建立了EM方法在指数族之外的收敛。
 
Dempster–Laird–Rubin算法的收敛分析存在缺陷,C. F. Jeff Wu在1983年发表了一项修正的收敛性分析。Wu的工作建立了EM方法在指数族之外的收敛。
      −
{{cite journal
+
== 介绍 ==
 
+
The EM algorithm is used to find (local) [[maximum likelihood]] parameters of a [[statistical model]] in cases where the equations cannot be solved directly. Typically these models involve [[latent variable]]s in addition to unknown [[parameters]] and known data observationsThat is, either [[missing values]] exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. For example, a [[mixture model]] can be described more simply by assuming that each observed data point has a corresponding unobserved data point, or latent variable, specifying the mixture component to which each data point belongs.
|last1=Dempster  |first1= A.P. |author-link1=Arthur P. Dempster
  −
 
  −
The convergence analysis of the Dempster–Laird–Rubin algorithm was flawed and a correct convergence analysis was published by C. F. Jeff Wu in 1983.
  −
 
  −
Dempster-Laird-Rubin 算法的收敛性分析是有缺陷的,c. f. Jeff Wu 于1983年发表了正确的收敛性分析。
  −
 
  −
  |last2=Laird |first2=N.M. |author-link2=Nan Laird |last3=Rubin
  −
 
  −
Wu's proof established the EM method's convergence outside of the exponential family, as claimed by Dempster–Laird–Rubin.
  −
 
  −
吴的证明建立了 EM 方法在指数族之外的收敛性,Dempster-Laird-Rubin 声称。
  −
 
  −
|first3=D.B. |author-link3=Donald Rubin
  −
 
  −
|title=Maximum Likelihood from Incomplete Data via the EM Algorithm
  −
 
  −
For any <math>\mathbf{Z}</math> with non-zero probability <math>p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta)</math>, we can write
  −
 
  −
对于任何带有非零概率的 < math > p (mathbf { z } mid mathbf { x } ,粗体字 theta) </math > ,我们可以写
  −
 
  −
|journal=[[Journal of the Royal Statistical Society, Series B]]
  −
 
  −
<math>
  −
 
  −
《数学》
  −
 
  −
|year=1977 |volume=39 |issue=1 |pages=1–38
  −
 
  −
\log p(\mathbf{X}\mid\boldsymbol\theta) = \log p(\mathbf{X},\mathbf{Z}\mid\boldsymbol\theta) - \log p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta).
  −
 
  −
Log p (mathbf { x } mid boldsymbol theta) = log p (mathbf { x } ,mathbf { z } mid boldsymbol theta)-log p (mathbf { z } mid mathbf { x } ,粗体符 theta)。
  −
 
  −
|jstor=2984875 |mr= 0501537
     −
</math>
+
Finding a maximum likelihood solution typically requires taking the [[derivative]]s of the [[likelihood function]] with respect to all the unknown values, the parameters and the latent variables, and simultaneously solving the resulting equations. In statistical models with latent variables, this is usually impossible. Instead, the result is typically a set of interlocking equations in which the solution to the parameters requires the values of the latent variables and vice versa, but substituting one set of equations into the other produces an unsolvable equation.
   −
数学
+
The EM algorithm proceeds from the observation that there is a way to solve these two sets of equations numerically. One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points.  It's not obvious that this will work, but it can be proven that in this context it does, and that the derivative of the likelihood is (arbitrarily close to) zero at that point, which in turn means that the point is either a maximum or a [[saddle point]].<ref name="Wu" /> In general, multiple maxima may occur, with no guarantee that the global maximum will be found.  Some likelihoods also have [[Mathematical singularity|singularities]] in them, i.e., nonsensical maxima.  For example, one of the ''solutions'' that may be found by EM in a mixture model involves setting one of the components to have zero variance and the mean parameter for the same component to be equal to one of the data points.
   −
}}
      
We take the expectation over possible values of the unknown data <math>\mathbf{Z}</math> under the current parameter estimate <math>\theta^{(t)}</math> by multiplying both sides by <math>p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)})</math> and summing (or integrating) over <math>\mathbf{Z}</math>.  The left-hand side is the expectation of a constant, so we get:
 
We take the expectation over possible values of the unknown data <math>\mathbf{Z}</math> under the current parameter estimate <math>\theta^{(t)}</math> by multiplying both sides by <math>p(\mathbf{Z}\mid\mathbf{X},\boldsymbol\theta^{(t)})</math> and summing (or integrating) over <math>\mathbf{Z}</math>.  The left-hand side is the expectation of a constant, so we get: