更改

删除429字节 、 2020年10月29日 (四) 00:04
第191行: 第191行:       −
====推断未观测变量===
+
===推断未观测变量===
    
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the ''evidence'' variables) are observed. This process of computing the ''posterior'' distribution of variables given evidence is called probabilistic inference. The posterior gives a universal [[sufficient statistic]] for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying [[Bayes' theorem]] to complex problems.
 
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the ''evidence'' variables) are observed. This process of computing the ''posterior'' distribution of variables given evidence is called probabilistic inference. The posterior gives a universal [[sufficient statistic]] for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying [[Bayes' theorem]] to complex problems.
第197行: 第197行:  
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems.
 
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems.
   −
因为'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>'''是变量及其关系的完整模型,所以它可以用来回答关于变量的概率查询。例如,当观察到其他变量(证据变量)时,网络可用于更新变量子集的状态知识。这个计算给定证据的变量后验概率的过程被称为概率推理。后验方法为检测应用提供了一个通用的充分的统计量,当为变量子集选择值时,可以最小化一些期望损失函数,例如决策错误的概率。因此,'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>'''可以被看作是一种自动应用贝叶斯定理解决复杂问题的机制。
+
因为贝叶斯网络是变量及其关系的完整模型,所以它可以用来回答关于变量的概率问题。例如,当网络中某些变量被观测到时,也就是其确切值已知的情况下,我们可以更新对网络中其他未观测变量的认知。这种给定一些证据,然后计算的变量的后验概率的过程被称为概率推理。有时我们需要为某些变量赋值的同时最小化期望损失,此时后验概率为一些检测应用,例如检测决策错误的概率,提供了一个通用且充分的统计量。因此,贝叶斯网络可以被看作是一种应用贝叶斯定理解决复杂问题的自动化机制。
      第205行: 第205行:  
The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product; clique tree propagation, which caches the computation so that many variables can be queried at one time and new evidence can be propagated quickly; and recursive conditioning and AND/OR search, which allow for a space–time tradeoff and match the efficiency of variable elimination when enough space is used. All of these methods have complexity that is exponential in the network's treewidth. The most common approximate inference algorithms are importance sampling, stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation, generalized belief propagation and variational methods.
 
The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the product; clique tree propagation, which caches the computation so that many variables can be queried at one time and new evidence can be propagated quickly; and recursive conditioning and AND/OR search, which allow for a space–time tradeoff and match the efficiency of variable elimination when enough space is used. All of these methods have complexity that is exponential in the network's treewidth. The most common approximate inference algorithms are importance sampling, stochastic MCMC simulation, mini-bucket elimination, loopy belief propagation, generalized belief propagation and variational methods.
   −
最常用的精确推理方法有: 变量消除法,通过积分或求和的方式将未观察到的非查询变量逐一消除; 分支树传播法,它将计算过程缓存,使得可以同时查询多个变量,并快速传播新的证据; 递归条件化和 / 或搜索法,它考虑了时空折衷,并且在使用足够空间时与变量消除法的效率相匹配。所有这些方法的复杂度都是网络树宽的指数级。最常见的近似推理算法有重要性抽样法、随机 MCMC 模拟法、小桶消去法、循环置信传播法、广义置信传播法和变分法。
+
最常用的精确推理方法有: '''<font color="#ff8000">变量消除法</font>''',即通过积分或求和的方式将未观察到的非查询变量逐一消除; '''<font color="#ff8000">团树传播算法</font>''',它将计算过程缓存,可以同时查询多个变量,并快速传播新的证据; '''<font color="#ff8000">递归条件化 AND/OR 搜索法</font>''',它考虑了时间和空间取舍,并且在使用足够的空间时拥有与变量消除法相比拟的时间效率。所有这些方法的复杂度都随网络树宽的增长呈指数级上升。而最常见的近似推理算法有'''<font color="#ff8000">重要性抽样法、随机 MCMC 模拟法、小桶消去法、循环置信传播法、广义置信传播法</font>'''和'''<font color="#ff8000">变分法</font>'''。
      第215行: 第215行:  
In order to fully specify the Bayesian network and thus fully represent the joint probability distribution, it is necessary to specify for each node X the probability distribution for X conditional upon Xs parents. The distribution of X conditional upon its parents may have any form. It is common to work with discrete or Gaussian distributions since that simplifies calculations. Sometimes only constraints on a distribution are known; one can then use the principle of maximum entropy to determine a single distribution, the one with the greatest entropy given the constraints. (Analogously, in the specific context of a dynamic Bayesian network, the conditional distribution for the hidden state's temporal evolution is commonly specified to maximize the entropy rate of the implied stochastic process.)
 
In order to fully specify the Bayesian network and thus fully represent the joint probability distribution, it is necessary to specify for each node X the probability distribution for X conditional upon Xs parents. The distribution of X conditional upon its parents may have any form. It is common to work with discrete or Gaussian distributions since that simplifies calculations. Sometimes only constraints on a distribution are known; one can then use the principle of maximum entropy to determine a single distribution, the one with the greatest entropy given the constraints. (Analogously, in the specific context of a dynamic Bayesian network, the conditional distribution for the hidden state's temporal evolution is commonly specified to maximize the entropy rate of the implied stochastic process.)
   −
为了完全指定'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>'''节点,从而完全代表联合分布节点,有必要为每个节点 x 指定基于 x s 父节点的概率分布节点 x。以其父母为条件的 x 的分布可以有任何形式。通常用于离散或'''<font color="#ff8000">高斯分布Gaussian distributions </font>''',因为这简化了计算。有时只有分布上的约束是已知的; 然后可以使用最大熵原理分布来确定一个单一的分布,即给定约束条件下熵最大的分布。类似地,在动态'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>'''的特定上下文中,隐状态时间演化的条件分布通常被指定为最大化隐含'''<font color="#ff8000"> 随机过程Stochastic process</font>'''的熵率
+
为了获得一个完整的贝叶斯网络,从而获得一个完整的联合概率分布,需要详知道每个节点 ''X'' 的条件概率分布。''X'' 的条件概率分布可以有各种各样的形式,最长见到、用到的分布是离散分布或高斯分布,因为计算简便。有时我们只知道一个分布应该符合哪些约束条件。此时可以使用最大熵原理来确定一个分布,即给定约束条件下熵最大的分布。类似地,在'''<font color="#ff8000">动态贝叶斯网络B</font>'''的特定上下文中,隐变量的条件分布通常也被认为是能最大化其随机过程的熵率的那一个分布。
      第223行: 第223行:  
Often these conditional distributions include parameters that are unknown and must be estimated from data, e.g., via the maximum likelihood approach. Direct maximization of the likelihood (or of the posterior probability) is often complex given unobserved variables. A classical approach to this problem is the expectation-maximization algorithm, which alternates computing expected values of the unobserved variables conditional on observed data, with maximizing the complete likelihood (or posterior) assuming that previously computed expected values are correct. Under mild regularity conditions this process converges on maximum likelihood (or maximum posterior) values for parameters.
 
Often these conditional distributions include parameters that are unknown and must be estimated from data, e.g., via the maximum likelihood approach. Direct maximization of the likelihood (or of the posterior probability) is often complex given unobserved variables. A classical approach to this problem is the expectation-maximization algorithm, which alternates computing expected values of the unobserved variables conditional on observed data, with maximizing the complete likelihood (or posterior) assuming that previously computed expected values are correct. Under mild regularity conditions this process converges on maximum likelihood (or maximum posterior) values for parameters.
   −
通常这些条件分布包括未知的参数,必须从数据中估计,例如,通过'''<font color="#ff8000"> 最大似然法Maximum likelihood approach</font>'''。直接最大化的可能性(或后验概率)往往是复杂的给定未观测的变量。这个问题的一个经典方法是期望最大化算法,它以观测数据为条件,交替计算未观测变量的期望值,并假设先前计算的期望值是正确的,最大化完全似然(或后验)。在温和的正则性条件下,这个过程收敛于参数的最大似然值(或最大后验值)。
+
通常这些条件分布包括未知的参数,必须从数据中估计出来,例如,使用'''<font color="#ff8000"> 最大似然估计</font>'''。给定未观测的变量,直接进行最大化似然(或后验概率)往往是很复杂的。这个问题的一个经典解决方案是'''<font color="#ff8000">EM算法</font>。它利用观测数据计算出未观测变量的期望值,并假设先前计算的期望值是正确的,然后最大化完全似然(或后验),求期望和最大化两个步骤交替迭代进行。在某些条件下,这个过程能收敛于待估计参数的最大似然值(或最大后验值)。
      第231行: 第231行:  
A more fully Bayesian approach to parameters is to treat them as additional unobserved variables and to compute a full posterior distribution over all nodes conditional upon observed data, then to integrate out the parameters. This approach can be expensive and lead to large dimension models, making classical parameter-setting approaches more tractable.
 
A more fully Bayesian approach to parameters is to treat them as additional unobserved variables and to compute a full posterior distribution over all nodes conditional upon observed data, then to integrate out the parameters. This approach can be expensive and lead to large dimension models, making classical parameter-setting approaches more tractable.
   −
一个更完整的贝叶斯参数方法是将它们视为附加的未观测变量,并根据观测数据计算所有节点的完整后验概率,然后整合出参数。这种方法可能代价高昂,并导致大规模模型,使经典的参数设置方法更易于处理。
+
一个更彻底的贝叶斯估计方法是将参数视为附加的未观测变量,并根据观测数据计算包含所有节点的完整后验概率分布,然后积分出参数。这种方法可能代价高昂,并得出一个大规模的模型,但能使经典的参数估计方法更易于处理。
      第243行: 第243行:  
In the simplest case, a Bayesian network is specified by an expert and is then used to perform inference. In other applications the task of defining the network is too complex for humans. In this case the network structure and the parameters of the local distributions must be learned from data.
 
In the simplest case, a Bayesian network is specified by an expert and is then used to perform inference. In other applications the task of defining the network is too complex for humans. In this case the network structure and the parameters of the local distributions must be learned from data.
   −
在最简单的情况下,专家指定一个'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>''',然后用它来执行推理。在其他应用程序中,定义网络的任务对于人类来说过于复杂。在这种情况下,必须从数据中学习网络结构和局部分布的参数。
+
在最简单的情况下,一个贝叶斯网络可以有领域专家人工构建,然后用它来执行推理。在其他应用程序中,构建网络的任务对于人类来说过于复杂。这种情况就必须从数据中学习网络结构和各个变量的局部分布。
      第251行: 第251行:  
Automatically learning the graph structure of a Bayesian network (BN) is a challenge pursued within machine learning. The basic idea goes back to a recovery algorithm developed by Rebane and Pearl and rests on the distinction between the three possible patterns allowed in a 3-node DAG:
 
Automatically learning the graph structure of a Bayesian network (BN) is a challenge pursued within machine learning. The basic idea goes back to a recovery algorithm developed by Rebane and Pearl and rests on the distinction between the three possible patterns allowed in a 3-node DAG:
   −
自动学习'''<font color="#ff8000"> 贝叶斯网络Bayesian networks</font>'''的图形结构是机器学习的一个挑战。其基本思想可以追溯到由 Rebane 和 Pearl 开发的恢复算法,该算法基于三节点有向无环图中允许的三种可能模式的区别:
+
自动化学习出贝叶斯网络的图形结构是机器学习领域的一个挑战。其基本思想可以追溯到由 Rebane 和 Pearl 提出的恢复算法。该算法的基础是三节点有向无环图中的三种可能模式:
    
{| class="wikitable"
 
{| class="wikitable"
  −
{| class="wikitable"
  −
  −
{ | class“ wikitable”
  −
  −
|+Junction patterns
  −
   
|+Junction patterns
 
|+Junction patterns
  −
| + 连接模式
  −
   
!Pattern
 
!Pattern
  −
!Pattern
  −
  −
! 模式
  −
   
!Model
 
!Model
  −
!Model
  −
  −
! 模特
  −
  −
|-
  −
  −
|-
  −
   
|-
 
|-
  −
|Chain
  −
   
|Chain
 
|Chain
  −
链接
  −
  −
!<math>X \rightarrow Y \rightarrow Z</math>
  −
   
!<math>X \rightarrow Y \rightarrow Z</math>
 
!<math>X \rightarrow Y \rightarrow Z</math>
  −
! math x  right tarrow y  right tarrow z / math
  −
   
|-
 
|-
  −
|-
  −
  −
|-
  −
  −
|Fork
  −
   
|Fork  
 
|Fork  
  −
叉子
  −
   
|<math>X \leftarrow Y \rightarrow Z</math>
 
|<math>X \leftarrow Y \rightarrow Z</math>
  −
|<math>X \leftarrow Y \rightarrow Z</math>
  −
  −
| math x  left tarrow y  right tarrow z / math
  −
   
|-
 
|-
  −
|-
  −
  −
|-
  −
  −
|Collider
  −
   
|Collider  
 
|Collider  
  −
碰撞器
  −
  −
|<math>X \rightarrow Y \leftarrow Z</math>
  −
   
|<math>X \rightarrow Y \leftarrow Z</math>
 
|<math>X \rightarrow Y \leftarrow Z</math>
  −
| math x  right tarrow y  leftarrow z / math
  −
  −
|}
  −
  −
|}
  −
   
|}
 
|}
   第341行: 第272行:  
The first 2 represent the same dependencies (<math>X</math> and <math>Z</math> are independent given <math>Y</math>) and are, therefore, indistinguishable. The collider, however, can be uniquely identified, since <math>X</math> and <math>Z</math> are marginally independent and all other pairs are dependent. Thus, while the skeletons (the graphs stripped of arrows) of these three triplets are identical, the directionality of the arrows is partially identifiable. The same distinction applies when <math>X</math> and <math>Z</math> have common parents, except that one must first condition on those parents. Algorithms have been developed to systematically determine the skeleton of the underlying graph and, then, orient all arrows whose directionality is dictated by the conditional independences observed.
 
The first 2 represent the same dependencies (<math>X</math> and <math>Z</math> are independent given <math>Y</math>) and are, therefore, indistinguishable. The collider, however, can be uniquely identified, since <math>X</math> and <math>Z</math> are marginally independent and all other pairs are dependent. Thus, while the skeletons (the graphs stripped of arrows) of these three triplets are identical, the directionality of the arrows is partially identifiable. The same distinction applies when <math>X</math> and <math>Z</math> have common parents, except that one must first condition on those parents. Algorithms have been developed to systematically determine the skeleton of the underlying graph and, then, orient all arrows whose directionality is dictated by the conditional independences observed.
   −
前2个表示相同的依赖关系(数学 x / math 和数学 z / math 在给定数学 y / math 时是独立的) ,因此无法区分。然而,碰撞器可以被唯一地标识,因为数学 x / math 和数学 z / math 是边际独立的,而其他所有对都是依赖的。因此,虽然这三个三元组的框架(去掉箭头的图)是相同的,箭头的方向是部分可识别的。当数学 x / math 和数学 z / math 有着共同的父母时,同样的区别也适用,除了父母必须有第一个条件。算法已经发展到系统地确定基础图的框架,然后由条件独立观察定向所有箭头的方向。
+
前2个种模式表示的依赖关系是相同的(在给定<math>Y</math> 时,<math>X</math> 和<math>Z</math>是独立的) ,因此无法区分。然而,第三种模式确实可以被识别的,因为<math>X</math> 和<math>Z</math>是边际独立的,而其他所有变量对都是依赖的。因此,虽然这三个节点的连接方式和前两种模式相同,但这种模式中箭头的方向是部分可识别的。当<math>X</math> 和<math>Z</math>有着共同的父母时,这种不同也能被识别。这个算法已经发展到能系统地确定图的框架,然后由从数据中观察到的条件独立性确定所有箭头的方向。
     
370

个编辑