更改

跳到导航 跳到搜索
删除673字节 、 2020年10月29日 (四) 22:16
第312行: 第312行:  
学习树宽有限的贝叶斯网络对于精确的,可追溯的推理是必要的,因为最坏情况下的推理复杂度也只是树宽 k 的指数级(在指数时间假设下)。然而,作为图表的全局属性,它大大增加了学习过程的难度。在这种情况下,可以使用 K 树进行有效的学习。
 
学习树宽有限的贝叶斯网络对于精确的,可追溯的推理是必要的,因为最坏情况下的推理复杂度也只是树宽 k 的指数级(在指数时间假设下)。然而,作为图表的全局属性,它大大增加了学习过程的难度。在这种情况下,可以使用 K 树进行有效的学习。
   −
==Statistical introduction统计简介==
+
==统计学简介==
    
{{Main|Bayesian statistics}}
 
{{Main|Bayesian statistics}}
第320行: 第320行:  
Given data <math>x\,\!</math> and parameter <math>\theta</math>, a simple Bayesian analysis starts with a prior probability (prior) <math>p(\theta)</math> and likelihood <math>p(x\mid\theta)</math> to compute a posterior probability <math>p(\theta\mid x) \propto p(x\mid\theta)p(\theta)</math>.
 
Given data <math>x\,\!</math> and parameter <math>\theta</math>, a simple Bayesian analysis starts with a prior probability (prior) <math>p(\theta)</math> and likelihood <math>p(x\mid\theta)</math> to compute a posterior probability <math>p(\theta\mid x) \propto p(x\mid\theta)p(\theta)</math>.
   −
给定数据数学 x / math and parameter math theta / math,一个简单的贝叶斯分析从先验概率数学 p ( theta) / 数学和似然数学 p (x mid theta) / 数学计算后验概率数学 p ( theta mid x) propto p (x mid theta) p ( theta) / 数学开始。
+
给定观测数据<math>x\,\!</math> 和模型参数 <math>\theta</math>,一个简单的贝叶斯分析任务就是用已知的先验概率<math>p(\theta)</math>和似然 <math>p(x\mid\theta)</math>,计算出后验概率<math>p(\theta\mid x) \propto p(x\mid\theta)p(\theta)</math>。
      第328行: 第328行:  
Often the prior on <math>\theta</math> depends in turn on other parameters <math>\varphi</math> that are not mentioned in the likelihood. So, the prior <math>p(\theta)</math> must be replaced by a likelihood <math>p(\theta\mid \varphi)</math>, and a prior <math>p(\varphi)</math> on the newly introduced parameters <math>\varphi</math> is required, resulting in a posterior probability
 
Often the prior on <math>\theta</math> depends in turn on other parameters <math>\varphi</math> that are not mentioned in the likelihood. So, the prior <math>p(\theta)</math> must be replaced by a likelihood <math>p(\theta\mid \varphi)</math>, and a prior <math>p(\varphi)</math> on the newly introduced parameters <math>\varphi</math> is required, resulting in a posterior probability
   −
通常,math  theta / math 的优先级依次取决于其他参数的数学 varphi / math,这些参数在可能性中没有被提到。因此,必须用似然数学 p ( theta mid varphi) / 数学代替先前的数学 p ( theta mid  varphi) / 数学,并且必须用先前的数学 p ( varphi) / 数学代替新引入的参数数学 varphi / 数学,从而产生'''<font color="#ff8000"> 后验概率Posterior probability</font>'''
+
通常,先验概率<math>\theta</math>又依赖于并没有在似然中出现的参数<math>\varphi</math>,因此,必须用似然<math>p(\theta\mid \varphi)</math>代替先验概率<math>p(\theta)</math>,且需要用到新引入的参数<math>\varphi</math>的先验概率<math>p(\varphi)</math>。如此一来,新的后验概率就变成了
 
  −
 
      
: <math>p(\theta,\varphi\mid x) \propto p(x\mid\theta)p(\theta\mid\varphi)p(\varphi).</math>
 
: <math>p(\theta,\varphi\mid x) \propto p(x\mid\theta)p(\theta\mid\varphi)p(\varphi).</math>
  −
<math>p(\theta,\varphi\mid x) \propto p(x\mid\theta)p(\theta\mid\varphi)p(\varphi).</math>
  −
  −
数学 p ( theta, varphi  mid) propto p (x  mid  theta) p ( theta  mid  varphi) p ( varphi) / math
  −
  −
      
This is the simplest example of a ''hierarchical Bayes model''.{{clarify|date=October 2009|reason=what makes it hierarchical? Are we talking [[hierarchy (mathematics)]] or [[hierarchical structure]]? Link to whichever one it is.}}
 
This is the simplest example of a ''hierarchical Bayes model''.{{clarify|date=October 2009|reason=what makes it hierarchical? Are we talking [[hierarchy (mathematics)]] or [[hierarchical structure]]? Link to whichever one it is.}}
第352行: 第344行:  
The process may be repeated; for example, the parameters <math>\varphi</math> may depend in turn on additional parameters <math>\psi\,\!</math>, which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.
 
The process may be repeated; for example, the parameters <math>\varphi</math> may depend in turn on additional parameters <math>\psi\,\!</math>, which require their own prior. Eventually the process must terminate, with priors that do not depend on unmentioned parameters.
   −
这个过程可能会重复; 例如,参数 math varphi / math 可能依次依赖于其他参数 math psi, ! 数学,这需要他们自己的优先权。最终,这个过程必须终止,其优先级不依赖于未提到的参数。
+
这个过程可能会一直重复。例如,参数<math>\varphi</math>可能依次依赖于其他参数 <math>\psi\,\!</math>,这就又需要<math>\psi\,\!</math>的先验概率。最终,这个层次化嵌套的过程必须终止,亦即某参数的先验概率不依赖于其他参数。
 
        −
===Introductory examples工业实例===
+
===工业级案例===
    
{{Expand section|date=March 2009|reason=More examples needed}}
 
{{Expand section|date=March 2009|reason=More examples needed}}
第366行: 第357行:  
Given the measured quantities <math>x_1,\dots,x_n\,\!</math>each with normally distributed errors of known standard deviation <math>\sigma\,\!</math>,
 
Given the measured quantities <math>x_1,\dots,x_n\,\!</math>each with normally distributed errors of known standard deviation <math>\sigma\,\!</math>,
   −
给定测量量数学 x1, dots,xn  , ! / 每个数学与已知标准差数学的正态分布误差,
+
给定观测到的一组测量数据<math>x_1,\dots,x_n\,\!</math>,每个数据点的误差都服从标准差为<math>\sigma\,\!</math>的正态分布:
 
  −
 
      
: <math>
 
: <math>
  −
<math>
  −
  −
数学
  −
  −
x_i \sim N(\theta_i, \sigma^2)
  −
  −
x_i \sim N(\theta_i, \sigma^2)
  −
   
x_i \sim N(\theta_i, \sigma^2)
 
x_i \sim N(\theta_i, \sigma^2)
  −
</math>
  −
   
</math>
 
</math>
  −
数学
  −
  −
      
Suppose we are interested in estimating the <math>\theta_i</math>. An approach would be to estimate the <math>\theta_i</math> using a [[maximum likelihood]] approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply
 
Suppose we are interested in estimating the <math>\theta_i</math>. An approach would be to estimate the <math>\theta_i</math> using a [[maximum likelihood]] approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply
第394行: 第367行:  
Suppose we are interested in estimating the <math>\theta_i</math>. An approach would be to estimate the <math>\theta_i</math> using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply
 
Suppose we are interested in estimating the <math>\theta_i</math>. An approach would be to estimate the <math>\theta_i</math> using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply
   −
假设我们有兴趣估计数学 theta i / math。一种方法是使用'''<font color="#ff8000"> 最大似然法Maximum likelihood approach</font>'''来估计 math  theta i / math; 由于观测值是独立的,似然分解和最大似然估计是简单的
+
假设我们想要估计<math>\theta_i</math>,一种方法是使用最大似然法。由于每个观测值是独立的,似然分解和最大似然估计很简单:
 
  −
 
      
: <math>
 
: <math>
  −
<math>
  −
  −
数学
  −
  −
\theta_i = x_i.
  −
   
\theta_i = x_i.
 
\theta_i = x_i.
  −
Theta i x i.
  −
   
</math>
 
</math>
  −
</math>
  −
  −
数学
  −
  −
      
However, if the quantities are related, so that for example the individual <math>\theta_i</math>have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,
 
However, if the quantities are related, so that for example the individual <math>\theta_i</math>have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,
第422行: 第377行:  
However, if the quantities are related, so that for example the individual <math>\theta_i</math>have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,
 
However, if the quantities are related, so that for example the individual <math>\theta_i</math>have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,
   −
然而,如果数量是相关的,例如个别的 math theta i / math 本身就是从一个潜在的分布中抽取出来的,那么这种关系破坏了独立性,并建议使用一个更复杂的模型,例如:
+
然而,如果每个数据点相关的,例如每个<math>\theta_i</math>本身就是从一个潜在的分布中采样出来的出来的,那么这就破坏了独立性,并意味着要用到一个更复杂的模型,例如:
 
  −
 
      
: <math>
 
: <math>
  −
<math>
  −
  −
数学
  −
  −
x_i \sim N(\theta_i,\sigma^2),
  −
  −
x_i \sim N(\theta_i,\sigma^2),
  −
   
x_i \sim N(\theta_i,\sigma^2),
 
x_i \sim N(\theta_i,\sigma^2),
  −
</math>
  −
   
</math>
 
</math>
  −
数学
  −
   
: <math>
 
: <math>
  −
<math>
  −
  −
数学
  −
  −
\theta_i\sim N(\varphi, \tau^2),
  −
   
\theta_i\sim N(\varphi, \tau^2),
 
\theta_i\sim N(\varphi, \tau^2),
  −
Theta i  sim n ( varphi  tau ^ 2) ,
  −
  −
</math>
  −
   
</math>
 
</math>
  −
数学
  −
  −
      
with [[improper prior]]s <math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>. When <math>n\ge 3</math>, this is an ''identified model'' (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual <math>\theta_i</math> will tend to move, or ''[[Shrinkage estimator|shrink]]'' away from the maximum likelihood estimates towards their common mean. This ''shrinkage'' is a typical behavior in hierarchical Bayes models.
 
with [[improper prior]]s <math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>. When <math>n\ge 3</math>, this is an ''identified model'' (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual <math>\theta_i</math> will tend to move, or ''[[Shrinkage estimator|shrink]]'' away from the maximum likelihood estimates towards their common mean. This ''shrinkage'' is a typical behavior in hierarchical Bayes models.
第468行: 第390行:  
with improper priors <math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>. When <math>n\ge 3</math>, this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual <math>\theta_i</math> will tend to move, or shrink away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.
 
with improper priors <math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>. When <math>n\ge 3</math>, this is an identified model (i.e. there exists a unique solution for the model's parameters), and the posterior distributions of the individual <math>\theta_i</math> will tend to move, or shrink away from the maximum likelihood estimates towards their common mean. This shrinkage is a typical behavior in hierarchical Bayes models.
   −
数学,数学,数学,数学,数学,数学,数学,数学。当 math n  ge 3 / math 时,这是一个确定的模型(即。模型参数存在唯一解) ,并且个体 math theta i / math 的后验分布趋向于移动,或者从最大似然估计收缩到共同均值。这种收缩是分层贝叶斯模型中的典型行为。
+
 
 +
其中<math>\varphi\sim\text{flat}</math>, <math>\tau\sim\text{flat} \in (0,\infty)</math>是不准确的先验概率。这是一个确定的模型(即,模型参数存在唯一解),且每个<math>\theta_i</math>的后验分布会在最大似然估计中被移除,或者说收缩,换成了他们的均值。这种收缩是层次贝叶斯模型中的典型处理方法。
         −
===Restrictions on priors优先权限===
+
===先验概率的约束条件===
    
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the [[Jeffreys prior]] often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the [[Loss function#Expected loss|expected loss]] will be [[admissible decision rule|inadmissible]].
 
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the [[Jeffreys prior]] often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the [[Loss function#Expected loss|expected loss]] will be [[admissible decision rule|inadmissible]].
第478行: 第401行:  
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the expected loss will be inadmissible.
 
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the expected loss will be inadmissible.
   −
在层次结构模型中选择优先级时需要特别注意,尤其是在层次结构的更高级别的尺度变量上,比如变量 math tau , ! 例子中的数学。通常的先验,例如 Jeffreys 的先验常常不起作用,因为后验概率不会是正常化的,通过最小化预期损失得出的估计也不会被采纳。
+
在层次结构模型中选择先验分布时需要特别注意,尤其是身处高层次的变量,比如上述例子中的变量 <math>\tau\,\!</math> 。常用的的先验分布,例如 Jeffreys先验往往不起作用,因为后验概率不是正规化的,而最小化预期损失得出的估计通常也无效。
    
==Definitions and concepts定义与概念==
 
==Definitions and concepts定义与概念==
370

个编辑

导航菜单