更改

跳到导航 跳到搜索
删除8,599字节 、 2021年11月14日 (日) 18:09
第1,349行: 第1,349行:       −
== Statistical Inference 统计学推论==
+
==统计学推论==
 
+
===参数估计 ===
{{See also|Poisson regression}}
+
给定一个''n''个测量值的样本 <math> k_i \in \{0,1,...\}</math> ,对于 ''i'' = 1,... ''n'',我们希望估计取样的泊松总体参数的值。最大似然估计是<ref>{{cite web |last=Paszek|first=Ewa |title=Maximum Likelihood Estimation – Examples |url = http://cnx.org/content/m13500/latest/?collection=col10343/latest}}</ref>
{{又见|泊松回归}}
  −
 
  −
 
  −
=== Parameter estimation参数估计 ===
  −
 
  −
Given a sample of ''n'' measured values <math> k_i \in \{0,1,...\}</math>, for ''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''n'', we wish to estimate the value of the parameter ''λ'' of the Poisson population from which the sample was drawn. The [[maximum likelihood]] estimate is <ref>{{cite web |last=Paszek|first=Ewa |title=Maximum Likelihood Estimation – Examples |url = http://cnx.org/content/m13500/latest/?collection=col10343/latest}}</ref>
  −
 
  −
Given a sample of n measured values <math> k_i \in \{0,1,...\}</math>, for i&nbsp;=&nbsp;1,&nbsp;...,&nbsp;n, we wish to estimate the value of the parameter λ of the Poisson population from which the sample was drawn. The maximum likelihood estimate is
  −
 
  −
给定一个 n 个测量值的样本{0,1,... } </math > ,对于 i = 1,... ,n,我们希望估计取样的泊松总体参数的值。''' 最大似然估计The maximum likelihood estimate'''是
  −
 
  −
 
      
: <math>\widehat{\lambda}_\mathrm{MLE}=\frac{1}{n}\sum_{i=1}^n k_i. \!</math>
 
: <math>\widehat{\lambda}_\mathrm{MLE}=\frac{1}{n}\sum_{i=1}^n k_i. \!</math>
   −
<math>\widehat{\lambda}_\mathrm{MLE}=\frac{1}{n}\sum_{i=1}^n k_i. \!</math>
  −
  −
1}{ n } sum { i = 1} ^ n ki.! 数学
  −
  −
  −
  −
Since each observation has expectation λ so does the sample mean. Therefore, the maximum likelihood estimate is an [[unbiased estimator]] of λ. It is also an efficient estimator since its variance achieves the [[Cramér–Rao lower bound]] (CRLB).{{Citation needed|date=April 2012}} Hence it is [[Minimum-variance unbiased estimator|minimum-variance unbiased]]. Also it can be proven that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.
  −
  −
Since each observation has expectation λ so does the sample mean. Therefore, the maximum likelihood estimate is an unbiased estimator of λ. It is also an efficient estimator since its variance achieves the Cramér–Rao lower bound (CRLB). Hence it is minimum-variance unbiased. Also it can be proven that the sum (and hence the sample mean as it is a one-to-one function of the sum) is a complete and sufficient statistic for λ.
  −
  −
因为每个观测值都有期望值,所以样本的意义也是如此。因此,''' 最大似然估计The maximum likelihood estimate'''是。由于其方差达到了 CRLB 下界,因此它也是一个有效的估计量。它是最小方差无偏的。也可以证明和(因此样本平均值是和的单射)是一个完整充分的统计量。
  −
  −
  −
  −
To prove sufficiency we may use the [[Sufficient statistic|factorization theorem]]. Consider partitioning the probability mass function of the joint Poisson distribution for the sample into two parts: one that depends solely on the sample <math>\mathbf{x}</math> (called <math>h(\mathbf{x})</math>) and one that depends on the parameter <math>\lambda</math> and the sample <math>\mathbf{x}</math> only through the function <math>T(\mathbf{x})</math>. Then <math>T(\mathbf{x})</math> is a sufficient statistic for <math>\lambda</math>.
  −
  −
To prove sufficiency we may use the factorization theorem. Consider partitioning the probability mass function of the joint Poisson distribution for the sample into two parts: one that depends solely on the sample <math>\mathbf{x}</math> (called <math>h(\mathbf{x})</math>) and one that depends on the parameter <math>\lambda</math> and the sample <math>\mathbf{x}</math> only through the function <math>T(\mathbf{x})</math>. Then <math>T(\mathbf{x})</math> is a sufficient statistic for <math>\lambda</math>.
     −
为了证明充分性,我们可以用''' 因子分解定理Factorization theorem'''。考虑将''' 联合泊松分布Joint Poisson distribution'''的''' 概率质量函数Probability mass function'''分成两部分: 一部分仅依赖于样本 < math > mathbf { x } </math > (称为 < math > h (mathbf { x }) </math >) ,另一部分依赖于参数 < math > lambda </math > 和样本 < math > mathbf { x } </math > 只通过函数 math < t (mathbf { x }) </math > 。那么 < math > t (mathbf { x }) </math > 就是 < math > lambda </math > 的一个充分的统计量。
+
因为每个观测值都有期望值λ,所以样本的意义也是如此。因此,最大似然估计是λ的无偏估计。它也是一个有效的估计器,因为它的方差达到了Cramér-Rao 下界(CRLB)。因此它是最小方差无偏的。也可以证明和(因此样本平均值是和的单射)是λ一个完整充分的统计量。
       +
为了证明充分性,我们可以用''' 因子分解定理 Factorization theorem'''。考虑将联合泊松分布的[[概率质量函数]]分成两部分: 一部分仅依赖于样本 <math>\mathbf{x}</math>(称为<math>h(\mathbf{x})</math>) ,另一部分依赖于参数<math>\lambda</math>和样本<math>\mathbf{x}</math>只通过函数 <math>T(\mathbf{x})</math>。那么<math>T(\mathbf{x})</math>就是<math>\lambda</math>的一个充分的统计量。
    
: <math> P(\mathbf{x})=\prod_{i=1}^n\frac{\lambda^{x_i} e^{-\lambda}}{x_i!}=\frac{1}{\prod_{i=1}^n x_i!} \times \lambda^{\sum_{i=1}^n x_i}e^{-n\lambda} </math>
 
: <math> P(\mathbf{x})=\prod_{i=1}^n\frac{\lambda^{x_i} e^{-\lambda}}{x_i!}=\frac{1}{\prod_{i=1}^n x_i!} \times \lambda^{\sum_{i=1}^n x_i}e^{-n\lambda} </math>
   −
<math> P(\mathbf{x})=\prod_{i=1}^n\frac{\lambda^{x_i} e^{-\lambda}}{x_i!}=\frac{1}{\prod_{i=1}^n x_i!} \times \lambda^{\sum_{i=1}^n x_i}e^{-n\lambda} </math>
  −
  −
= prod _ { i = 1} ^ n frac { lambda ^ { x _ i } e ^ {-lambda }{ x _ i!1}{ prod { i = 1} ^ n x i! }乘以 lambda ^ { sum { i = 1} ^ n xi } e ^ {-n lambda } </math >
  −
  −
  −
  −
The first term, <math>h(\mathbf{x})</math>, depends only on <math>\mathbf{x}</math>. The second term, <math>g(T(\mathbf{x})|\lambda)</math>, depends on the sample only through <math>T(\mathbf{x})=\sum_{i=1}^nx_i</math>. Thus, <math>T(\mathbf{x})</math> is sufficient.
  −
  −
The first term, <math>h(\mathbf{x})</math>, depends only on <math>\mathbf{x}</math>. The second term, <math>g(T(\mathbf{x})|\lambda)</math>, depends on the sample only through <math>T(\mathbf{x})=\sum_{i=1}^nx_i</math>. Thus, <math>T(\mathbf{x})</math> is sufficient.
  −
  −
第一个术语,< math > h (mathbf { x }) </math > ,仅依赖于 < math > mathbf { x } </math > 。第二个术语,< math > g (t (mathbf { x }) | lambda) </math > ,仅通过 < math > t (mathbf { x }) = sum { i = 1} ^ nx _ i </math > 取决于样本。因此,< math > t (mathbf { x }) </math > 就足够了。
  −
  −
  −
  −
To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the likelihood function:
  −
  −
To find the parameter λ that maximizes the probability function for the Poisson population, we can use the logarithm of the likelihood function:
     −
为了找到泊松族群概率密度函数最大的参数λ ,我们可以使用''' 似然函数Likelihood function'''的对数:
+
第一个术语,<math>h(\mathbf{x})</math> ,仅依赖于<math>\mathbf{x}</math> 。第二个术语,<math>g(T(\mathbf{x})|\lambda)</math>,仅通过<math>T(\mathbf{x})=\sum_{i=1}^nx_i</math>取决于样本。因此,<math>T(\mathbf{x})</math>就足够了。
       +
为了找到泊松族群概率密度函数最大的参数λ ,我们可以使用似然函数的对数:
    
: <math> \begin{align} \ell(\lambda) & = \ln \prod_{i=1}^n f(k_i \mid \lambda) \\ & = \sum_{i=1}^n \ln\!\left(\frac{e^{-\lambda}\lambda^{k_i}}{k_i!}\right) \\ & = -n\lambda + \left(\sum_{i=1}^n k_i\right) \ln(\lambda) - \sum_{i=1}^n \ln(k_i!). \end{align} </math>
 
: <math> \begin{align} \ell(\lambda) & = \ln \prod_{i=1}^n f(k_i \mid \lambda) \\ & = \sum_{i=1}^n \ln\!\left(\frac{e^{-\lambda}\lambda^{k_i}}{k_i!}\right) \\ & = -n\lambda + \left(\sum_{i=1}^n k_i\right) \ln(\lambda) - \sum_{i=1}^n \ln(k_i!). \end{align} </math>
  −
<math> \begin{align} \ell(\lambda) & = \ln \prod_{i=1}^n f(k_i \mid \lambda) \\ & = \sum_{i=1}^n \ln\!\left(\frac{e^{-\lambda}\lambda^{k_i}}{k_i!}\right) \\ & = -n\lambda + \left(\sum_{i=1}^n k_i\right) \ln(\lambda) - \sum_{i=1}^n \ln(k_i!). \end{align} </math>
  −
  −
{ align } ell (lambda) & = ln prod { i = 1} ^ n f (k _ i mid lambda) & = sum { i = 1} ^ n ln! left (frac { e ^ {-lambda } lambda ^ { k _ i }{ k _ i!} right) & =-n lambda + left (sum _ { i = 1} ^ n k _ i right) ln (lambda)-sum _ { i = 1} ^ n ln (k _ i!).结束{ align } </math >
  −
  −
  −
  −
We take the derivative of ''<math>\ell</math>'' with respect to ''λ'' and compare it to zero:
  −
  −
We take the derivative of <math>\ell</math> with respect to λ and compare it to zero:
  −
  −
我们对 < math > 求导,然后将其与零进行比较:
         +
我们取导数''<math>\ell</math>''关于λ并将其与零进行比较
    
: <math>\frac{\mathrm{d}}{\mathrm{d}\lambda} \ell(\lambda) = 0 \iff -n + \left(\sum_{i=1}^n k_i\right) \frac{1}{\lambda} = 0. \!</math>
 
: <math>\frac{\mathrm{d}}{\mathrm{d}\lambda} \ell(\lambda) = 0 \iff -n + \left(\sum_{i=1}^n k_i\right) \frac{1}{\lambda} = 0. \!</math>
  −
<math>\frac{\mathrm{d}}{\mathrm{d}\lambda} \ell(\lambda) = 0 \iff -n + \left(\sum_{i=1}^n k_i\right) \frac{1}{\lambda} = 0. \!</math>
  −
  −
1}{ lambda } = 0 iff-n + left (sum { i = 1} ^ n k i right) frac {1}{ lambda } = 0.! 数学
  −
  −
  −
  −
Solving for ''λ'' gives a stationary point.
  −
  −
Solving for λ gives a stationary point.
  −
  −
解出 λ得到''' 驻点Stationary point'''。
         +
解出 λ得到驻点。
    
: <math> \lambda = \frac{\sum_{i=1}^n k_i}{n}</math>
 
: <math> \lambda = \frac{\sum_{i=1}^n k_i}{n}</math>
  −
<math> \lambda = \frac{\sum_{i=1}^n k_i}{n}</math>
  −
  −
{ math > lambda = frac { sum { i = 1} ^ n k _ i }{ n } </math >
  −
  −
  −
  −
So ''λ'' is the average of the ''k''<sub>''i''</sub> values. Obtaining the sign of the second derivative of ''L'' at the stationary point will determine what kind of extreme value ''λ'' is.
  −
  −
So λ is the average of the k<sub>i</sub> values. Obtaining the sign of the second derivative of L at the stationary point will determine what kind of extreme value λ is.
  −
  −
K < sub > i </sub > 值的平均值也是如此。在驻点得到 l 的二阶导数的符号将决定什么是极值。
         +
所以λ是''k''<sub>''i''</sub>值的平均值。获得''L''在驻点处的二阶导数的符号将决定''λ''是何种极值。
    
: <math>\frac{\partial^2 \ell}{\partial \lambda^2} = -\lambda^{-2}\sum_{i=1}^n k_i </math>
 
: <math>\frac{\partial^2 \ell}{\partial \lambda^2} = -\lambda^{-2}\sum_{i=1}^n k_i </math>
   −
<math>\frac{\partial^2 \ell}{\partial \lambda^2} = -\lambda^{-2}\sum_{i=1}^n k_i </math>
  −
  −
{ partial ^ 2 ell }{ partial lambda ^ 2} =-lambda ^ {-2} sum { i = 1} ^ n k i </math >
  −
  −
  −
  −
Evaluating the second derivative ''at the stationary point'' gives:
  −
  −
Evaluating the second derivative at the stationary point gives:
      
在驻点对二阶导数进行评估得出:
 
在驻点对二阶导数进行评估得出:
  −
      
: <math>\frac{\partial^2 \ell}{\partial \lambda^2} = - \frac{n^2}{\sum_{i=1}^n k_i} </math>
 
: <math>\frac{\partial^2 \ell}{\partial \lambda^2} = - \frac{n^2}{\sum_{i=1}^n k_i} </math>
   −
<math>\frac{\partial^2 \ell}{\partial \lambda^2} = - \frac{n^2}{\sum_{i=1}^n k_i} </math>
     −
{ partial ^ 2 ell }{ partial lambda ^ 2} =-frac { n ^ 2}{ sum { i = 1} ^ n ki } </math >
+
它是''n''乘以k<sub>i</sub>平均值的倒数。当平均数为正时,这个表达式是负的。如果这一点得到了满足,那么驻点最大化了概率密度函数。
 
  −
 
  −
 
  −
which is the negative of ''n'' times the reciprocal of the average of the k<sub>i</sub>. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.
  −
 
  −
which is the negative of n times the reciprocal of the average of the k<sub>i</sub>. This expression is negative when the average is positive. If this is satisfied, then the stationary point maximizes the probability function.
  −
 
  −
它是 n 乘以 k < sub > i </sub > 平均值的倒数。当平均数为正时,这个表达式是负的。如果这一点得到了满足,那么''' 驻点The stationary point'''最大化了概率密度函数。
  −
 
  −
 
  −
 
  −
For [[Completeness (statistics)|completeness]], a family of distributions is said to be complete if and only if <math> E(g(T)) = 0</math>  implies that <math>P_\lambda(g(T) = 0) = 1</math> for all <math>\lambda</math>. If the individual <math>X_i</math> are iid <math>\mathrm{Po}(\lambda)</math>, then <math>T(\mathbf{x})=\sum_{i=1}^nX_i\sim \mathrm{Po}(n\lambda)</math>. Knowing the distribution we want to investigate, it is easy to see that the statistic is complete.
  −
 
  −
For completeness, a family of distributions is said to be complete if and only if <math> E(g(T)) = 0</math>  implies that <math>P_\lambda(g(T) = 0) = 1</math> for all <math>\lambda</math>. If the individual <math>X_i</math> are iid <math>\mathrm{Po}(\lambda)</math>, then <math>T(\mathbf{x})=\sum_{i=1}^nX_i\sim \mathrm{Po}(n\lambda)</math>. Knowing the distribution we want to investigate, it is easy to see that the statistic is complete.
  −
 
  −
为了完整性起见,一个分布族被认为是完整的,当且仅当 < math > e (g (t)) = 0 </math > 意味着 < math > p lambda (g (t) = 0) = 1 </math > 对于所有 < math > λ </math > 。如果个体 < math > xi </math > 是 < math > mathrm { Po }(λ) </math > ,那么 < math > t (mathbf { x }) = sum { i = 1} ^ nX _ i sim mathrm { Po }(n lambda) </math > 。了解了我们要调查的分布情况后,很容易看出统计数据是完整的。
         +
为了完整性起见,一个分布族被认为是完整的,当且仅当<math> E(g(T)) = 0</math>意味着<math>P_\lambda(g(T) = 0) = 1</math>对于所有<math>\lambda</math>。如果个体<math>X_i</math是 <math>\mathrm{Po}(\lambda)</math> ,那么<math>T(\mathbf{x})=\sum_{i=1}^nX_i\sim \mathrm{Po}(n\lambda)</math>。了解了我们要调查的分布情况后,很容易看出统计数据是完整的。
    
: <math>E(g(T))=\sum_{t=0}^\infty g(t)\frac{(n\lambda)^te^{-n\lambda}}{t!}=0</math>
 
: <math>E(g(T))=\sum_{t=0}^\infty g(t)\frac{(n\lambda)^te^{-n\lambda}}{t!}=0</math>
   −
<math>E(g(T))=\sum_{t=0}^\infty g(t)\frac{(n\lambda)^te^{-n\lambda}}{t!}=0</math>
  −
  −
< math > e (g (t)) = sum { t = 0} ^ infty g (t) frac {(n lambda) ^ te ^ {-n lambda }{ t!0 </math >
  −
  −
  −
  −
For this equality to hold, <math>g(t)</math> must be 0. This follows from the fact that none of the other terms will be 0 for all <math>t</math> in the sum and for all possible values of <math>\lambda</math>. Hence, <math> E(g(T)) = 0</math> for all <math>\lambda</math> implies that <math>P_\lambda(g(T) = 0) = 1</math>, and the statistic has been shown to be complete.
  −
  −
For this equality to hold, <math>g(t)</math> must be 0. This follows from the fact that none of the other terms will be 0 for all <math>t</math> in the sum and for all possible values of <math>\lambda</math>. Hence, <math> E(g(T)) = 0</math> for all <math>\lambda</math> implies that <math>P_\lambda(g(T) = 0) = 1</math>, and the statistic has been shown to be complete.
     −
要保证这个等式成立,< math > g (t) </math > 必须为0。这源于这样一个事实: 对于所有 < math > t </math > 的和和以及 < math > > lambda </math > 的所有可能值,其他项都不会为0。因此,e (g (t)) = 0 </math > > lambda </math > 意味着 < math > p _ lambda (g (t) = 0 = 1 </math > ,统计已被证明是完整的。
+
要保证这个等式成立,<math>g(t)</math>必须为0。这源于这样一个事实: 对于所有<math>t</math>的和和以及<math>\lambda</math>的所有可能值,其他项都不会为0。因此,<math> E(g(T)) = 0</math>意味着<math>P_\lambda(g(T) = 0) = 1</math>,统计已被证明是完整的。
     
7,129

个编辑

导航菜单