第272行: |
第272行: |
| | | |
| 其中 p { z mid x }(cdot mid x; theta) </math > 是给定观测数据的未观测数据的条件分布 < math > x </math > 和 < math > d { KL } </math > 是 Kullback-Leibler 发散。 | | 其中 p { z mid x }(cdot mid x; theta) </math > 是给定观测数据的未观测数据的条件分布 < math > x </math > 和 < math > d { KL } </math > 是 Kullback-Leibler 发散。 |
− |
| |
− | ==Introduction==
| |
− |
| |
− | The EM algorithm is used to find (local) [[maximum likelihood]] parameters of a [[statistical model]] in cases where the equations cannot be solved directly. Typically these models involve [[latent variable]]s in addition to unknown [[parameters]] and known data observations. That is, either [[missing values]] exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. For example, a [[mixture model]] can be described more simply by assuming that each observed data point has a corresponding unobserved data point, or latent variable, specifying the mixture component to which each data point belongs.
| |
− |
| |
− | Then the steps in the EM algorithm may be viewed as:
| |
− |
| |
− | 然后 EM 算法中的步骤可以被看作:
| |
− |
| |
− |
| |
− |
| |
− | Expectation step: Choose <math>q</math> to maximize <math>F</math>:
| |
− |
| |
− | 期望步骤: 选择 < math > q </math > 来最大化 < math > f </math > :
| |
− |
| |
− | Finding a maximum likelihood solution typically requires taking the [[derivative]]s of the [[likelihood function]] with respect to all the unknown values, the parameters and the latent variables, and simultaneously solving the resulting equations. In statistical models with latent variables, this is usually impossible. Instead, the result is typically a set of interlocking equations in which the solution to the parameters requires the values of the latent variables and vice versa, but substituting one set of equations into the other produces an unsolvable equation.
| |
− |
| |
− | <math> q^{(t)} = \operatorname{arg\,max}_q \ F(q,\theta^{(t)}) </math>
| |
− |
| |
− | [ math > q ^ {(t)} = operatorname { arg,max } _ q f (q,theta ^ {(t)}) </math >
| |
− |
| |
− |
| |
− |
| |
− | Maximization step: Choose <math>\theta</math> to maximize <math>F</math>:
| |
− |
| |
− | 最大化步骤: 选择 < math > theta </math > 来最大化 < math > f </math > :
| |
− |
| |
− | The EM algorithm proceeds from the observation that there is a way to solve these two sets of equations numerically. One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points. It's not obvious that this will work, but it can be proven that in this context it does, and that the derivative of the likelihood is (arbitrarily close to) zero at that point, which in turn means that the point is either a maximum or a [[saddle point]].<ref name="Wu" /> In general, multiple maxima may occur, with no guarantee that the global maximum will be found. Some likelihoods also have [[Mathematical singularity|singularities]] in them, i.e., nonsensical maxima. For example, one of the ''solutions'' that may be found by EM in a mixture model involves setting one of the components to have zero variance and the mean parameter for the same component to be equal to one of the data points.
| |
− |
| |
− | <math> \theta^{(t+1)} = \operatorname{arg\,max}_\theta \ F(q^{(t)},\theta) </math>
| |
− |
| |
− | [ math > theta ^ {(t + 1)} = operatorname { arg,max } _ theta f (q ^ {(t)} ,theta) </math >
| |
− |
| |
− |
| |
| | | |
| ==Description== | | ==Description== |