更改

添加752字节 、 2020年9月8日 (二) 20:30
无编辑摘要
第1行: 第1行: −
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。
+
此词条暂由水流心不竞初译,未经审校,带来阅读不便,请见谅。
    
{{Merge|Causal graph|discuss=Talk:Causal graph#Merge to Bayesian network|date=March 2020}}
 
{{Merge|Causal graph|discuss=Talk:Causal graph#Merge to Bayesian network|date=March 2020}}
第25行: 第25行:  
A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor.  For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
 
A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor.  For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
   −
贝氏网路、贝叶斯网络、信念网络、决策网络、贝叶斯模型或概率有向无环图模型是一种概率图模型(一种统计模型) ,它通过有向无环图无环图(DAG)表示一组变量及其条件依赖关系。贝叶斯网络是理想的事件发生和预测的可能性,任何一个几个可能的已知原因是影响因素。例如,贝氏网路可以表示疾病和症状之间的概率关系。在给定症状的情况下,该网络可用于计算各种疾病出现的概率。
+
'''<font color="#ff8000"> 贝氏网路Bayesian network、贝叶斯网络 Bayes network</font>'''、信念网络、决策网络、贝叶斯模型或概率有向无环图模型是一种概率图模型(一种统计模型) ,它通过有向无环图无环图(DAG)表示一组变量及其条件依赖关系。贝叶斯网络是理想的事件发生和预测的可能性,任何一个几个可能的已知原因是影响因素。例如,'''<font color="#ff8000"> 贝氏网路Bayesian network</font>'''可以表示疾病和症状之间的概率关系。在给定症状的情况下,该网络可用于计算各种疾病出现的概率。
      第33行: 第33行:  
Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.
 
Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.
   −
有效的算法可以在贝叶斯网络中进行推理和学习。贝叶斯网络模型序列的变量(例如。语音信号或蛋白质序列)被称为动态贝叶斯网络。贝叶斯网络能够表示和解决不确定性决策问题的推广称为影响图。
+
有效的算法可以在'''<font color="#ff8000"> 贝叶斯网络 Bayes network</font>'''中进行推理和学习。'''<font color="#ff8000"> 贝叶斯网络 Bayes network</font>'''模型序列的变量(例如。语音信号或蛋白质序列)被称为动态贝叶斯网络。能够表示和解决不确定性决策问题的贝叶斯网络的推广称为影响图。
      第41行: 第41行:       −
==Graphical model==
+
==Graphical model图模型==
    
Formally, Bayesian networks are [[Directed acyclic graph|directed acyclic graphs]] (DAGs) whose nodes represent variables in the [[Bayesian probability|Bayesian]] sense: they may be observable quantities, [[latent variable]]s, unknown parameters or hypotheses. Edges represent conditional dependencies; nodes that are not connected (no path connects one node to another) represent variables that are [[conditional independence|conditionally independent]] of each other. Each node is associated with a [[probability function]] that takes, as input, a particular set of values for the node's [[Glossary of graph theory#Directed acyclic graphs|parent]] variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if <math>m</math> parent nodes represent <math>m</math> [[Boolean data type|Boolean variables]], then the probability function could be represented by a table of <small><math>2^m</math></small> entries, one entry for each of the <small><math>2^m</math></small> possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as [[Markov network]]s.
 
Formally, Bayesian networks are [[Directed acyclic graph|directed acyclic graphs]] (DAGs) whose nodes represent variables in the [[Bayesian probability|Bayesian]] sense: they may be observable quantities, [[latent variable]]s, unknown parameters or hypotheses. Edges represent conditional dependencies; nodes that are not connected (no path connects one node to another) represent variables that are [[conditional independence|conditionally independent]] of each other. Each node is associated with a [[probability function]] that takes, as input, a particular set of values for the node's [[Glossary of graph theory#Directed acyclic graphs|parent]] variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if <math>m</math> parent nodes represent <math>m</math> [[Boolean data type|Boolean variables]], then the probability function could be represented by a table of <small><math>2^m</math></small> entries, one entry for each of the <small><math>2^m</math></small> possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as [[Markov network]]s.
第47行: 第47行:  
Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Edges represent conditional dependencies; nodes that are not connected (no path connects one node to another) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if <math>m</math> parent nodes represent <math>m</math> Boolean variables, then the probability function could be represented by a table of <small><math>2^m</math></small> entries, one entry for each of the <small><math>2^m</math></small> possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks.
 
Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Edges represent conditional dependencies; nodes that are not connected (no path connects one node to another) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if <math>m</math> parent nodes represent <math>m</math> Boolean variables, then the probability function could be represented by a table of <small><math>2^m</math></small> entries, one entry for each of the <small><math>2^m</math></small> possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks.
   −
在形式上,贝叶斯网络是有向无环图(DAGs) ,其节点表示贝叶斯意义上的变量: 它们可以是可观测量、潜在变量、未知参数或假设。边表示条件依赖; 未连接(没有路径连接一个节点到另一个节点)的节点表示彼此有条件独立的变量。每个节点都与一个概率密度函数节点相关联,该节点接受一组特定的父变量值作为输入,并给出(作为输出)该节点表示的变量的概率(或概率分布,如果适用的话)。例如,如果 math m / math 父节点表示 math m / math 布尔变量,那么概率密度函数可以用一个小 math 2 ^ m / math / small 条目表示,每个小 math 2 ^ m / math / small 可能的父节点都有一个条目。类似的想法可以应用于无向图,也可能是循环图,如马尔可夫网络。
+
在形式上,'''<font color="#ff8000"> 贝叶斯网络 Bayes network</font>'''是有向无环图(DAGs) ,其节点表示贝叶斯意义上的变量: 它们可以是可观测量、潜在变量、未知参数或假设。边表示条件依赖; 未连接(没有路径连接一个节点到另一个节点)的节点表示彼此有条件独立的变量。每个节点都与一个'''<font color="#ff8000"> 概率密度函数Probability function </font>'''节点相关联,该节点接受一组特定的父变量值作为输入,并给出(作为输出)该节点表示的变量的概率(或概率分布,如果适用的话)。例如,如果 math m / math 父节点表示 math m / math 布尔变量,那么概率密度函数可以用一个小 math 2 ^ m / math / small 条目表示,每个小 math 2 ^ m / math / small 可能的父节点都有一个条目。类似的想法可以应用于无向图,也可能是循环图,如马尔可夫网络。
         −
==Example==
+
==Example举例==
    
[[Image:SimpleBayesNet.svg|400px|thumb|right|A simple Bayesian network with [[conditional probability table]]s ]]
 
[[Image:SimpleBayesNet.svg|400px|thumb|right|A simple Bayesian network with [[conditional probability table]]s ]]
第73行: 第73行:  
The joint probability function is:
 
The joint probability function is:
   −
联合概率密度函数是:
+
'''<font color="#ff8000"> 联合概率密度函数Joint probability function</font>'''是:
      第281行: 第281行:       −
==Inference and learning==
+
==Inference and learning推论与学习==
    
Bayesian networks perform three main inference tasks:
 
Bayesian networks perform three main inference tasks:
第291行: 第291行:       −
===Inferring unobserved variables===
+
===Inferring unobserved variables预测隐含变量===
    
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the ''evidence'' variables) are observed. This process of computing the ''posterior'' distribution of variables given evidence is called probabilistic inference. The posterior gives a universal [[sufficient statistic]] for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying [[Bayes' theorem]] to complex problems.
 
Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the ''evidence'' variables) are observed. This process of computing the ''posterior'' distribution of variables given evidence is called probabilistic inference. The posterior gives a universal [[sufficient statistic]] for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying [[Bayes' theorem]] to complex problems.
第309行: 第309行:       −
===Parameter learning===
+
===Parameter learning参数预测===
    
In order to fully specify the Bayesian network and thus fully represent the [[joint probability distribution]], it is necessary to specify for each node ''X'' the probability distribution for ''X'' conditional upon ''X'''s parents. The distribution of ''X'' conditional upon its parents may have any form. It is common to work with discrete or [[normal distribution|Gaussian distributions]] since that simplifies calculations. Sometimes only constraints on a distribution are known; one can then use the [[principle of maximum entropy]] to determine a single distribution, the one with the greatest [[information entropy|entropy]] given the constraints. (Analogously, in the specific context of a [[dynamic Bayesian network]], the conditional distribution for the hidden state's temporal evolution is commonly specified to maximize the [[entropy rate]] of the implied stochastic process.)
 
In order to fully specify the Bayesian network and thus fully represent the [[joint probability distribution]], it is necessary to specify for each node ''X'' the probability distribution for ''X'' conditional upon ''X'''s parents. The distribution of ''X'' conditional upon its parents may have any form. It is common to work with discrete or [[normal distribution|Gaussian distributions]] since that simplifies calculations. Sometimes only constraints on a distribution are known; one can then use the [[principle of maximum entropy]] to determine a single distribution, the one with the greatest [[information entropy|entropy]] given the constraints. (Analogously, in the specific context of a [[dynamic Bayesian network]], the conditional distribution for the hidden state's temporal evolution is commonly specified to maximize the [[entropy rate]] of the implied stochastic process.)
第335行: 第335行:       −
===Structure learning===
+
===Structure learning结构预测===
      第449行: 第449行:  
An alternative method of structural learning uses optimization-based search. It requires a scoring function and a search strategy. A common scoring function is posterior probability of the structure given the training data, like the BIC or the BDeu. The time requirement of an exhaustive search returning a structure that maximizes the score is superexponential in the number of variables. A local search strategy makes incremental changes aimed at improving the score of the structure. A global search algorithm like Markov chain Monte Carlo can avoid getting trapped in local minima. Friedman et al. discuss using mutual information between variables and finding a structure that maximizes this. They do this by restricting the parent candidate set to k nodes and exhaustively searching therein.
 
An alternative method of structural learning uses optimization-based search. It requires a scoring function and a search strategy. A common scoring function is posterior probability of the structure given the training data, like the BIC or the BDeu. The time requirement of an exhaustive search returning a structure that maximizes the score is superexponential in the number of variables. A local search strategy makes incremental changes aimed at improving the score of the structure. A global search algorithm like Markov chain Monte Carlo can avoid getting trapped in local minima. Friedman et al. discuss using mutual information between variables and finding a structure that maximizes this. They do this by restricting the parent candidate set to k nodes and exhaustively searching therein.
   −
另一种结构学习方法使用基于优化的搜索。它需要一个评分函数和一个搜索策略。一个常见的评分函数是给定训练数据的结构后验概率,比如 BIC 或 BDeu。穷举搜索返回最大化得分结构的时间要求在变量数上是超指数的。局部搜索策略进行增量改变,目的是改进结构的得分。像马尔科夫蒙特卡洛这样的全局搜索算法可以避免陷入局部极小。弗里德曼等人。讨论使用变量之间的互信息,并找到一个结构,最大化这一点。他们通过将父候选节点集限制为 k 节点并在其中进行穷尽式搜索来实现这一点。
+
另一种结构学习方法使用基于优化的搜索。它需要一个评分函数和一个搜索策略。一个常见的评分函数是给定训练数据的结构'''<font color="#ff8000"> '后验概率Posterior probability</font>'',比如 BIC 或 BDeu。穷举搜索返回最大化得分结构的时间要求在变量数上是超指数的。局部搜索策略进行增量改变,目的是改进结构的得分。像马尔科夫蒙特卡洛这样的全局搜索算法可以避免陷入局部极小。弗里德曼等人。讨论使用变量之间的互信息,并找到一个结构,最大化这一点。他们通过将父候选节点集限制为 k 节点并在其中进行穷尽式搜索来实现这一点。
      第485行: 第485行:       −
==Statistical introduction==
+
==Statistical introduction统计简介==
    
{{Main|Bayesian statistics}}
 
{{Main|Bayesian statistics}}
第529行: 第529行:       −
===Introductory examples===
+
===Introductory examples工业实例===
    
{{Expand section|date=March 2009|reason=More examples needed}}
 
{{Expand section|date=March 2009|reason=More examples needed}}
第645行: 第645行:       −
===Restrictions on priors===
+
===Restrictions on priors优先权限===
    
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the [[Jeffreys prior]] often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the [[Loss function#Expected loss|expected loss]] will be [[admissible decision rule|inadmissible]].
 
Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable <math>\tau\,\!</math> in the example. The usual priors such as the [[Jeffreys prior]] often do not work, because the posterior distribution will not be normalizable and estimates made by minimizing the [[Loss function#Expected loss|expected loss]] will be [[admissible decision rule|inadmissible]].
第655行: 第655行:       −
==Definitions and concepts==
+
==Definitions and concepts定义与概念==
    
{{See also|Glossary of graph theory#Directed acyclic graphs}}
 
{{See also|Glossary of graph theory#Directed acyclic graphs}}
第667行: 第667行:       −
===Factorization definition===
+
===Factorization definition因子分解定义===
    
''X'' is a Bayesian network with respect to ''G'' if its joint [[probability density function]] (with respect to a [[product measure]]) can be written as a product of the individual density functions, conditional on their parent variables:{{sfn|Russell|Norvig|2003|p=496}}
 
''X'' is a Bayesian network with respect to ''G'' if its joint [[probability density function]] (with respect to a [[product measure]]) can be written as a product of the individual density functions, conditional on their parent variables:{{sfn|Russell|Norvig|2003|p=496}}
第733行: 第733行:       −
===Local Markov property===
+
===Local Markov property局部马尔可夫性质===
    
''X'' is a Bayesian network with respect to ''G'' if it satisfies the ''local Markov property'': each variable is [[Conditional independence|conditionally independent]] of its non-descendants given its parent variables:{{sfn|Russell|Norvig|2003|p=499}}
 
''X'' is a Bayesian network with respect to ''G'' if it satisfies the ''local Markov property'': each variable is [[Conditional independence|conditionally independent]] of its non-descendants given its parent variables:{{sfn|Russell|Norvig|2003|p=499}}
第813行: 第813行:       −
===Developing Bayesian networks===
+
===Developing Bayesian networks发展贝叶斯网络===
    
Developing a Bayesian network often begins with creating a DAG ''G'' such that ''X'' satisfies the local Markov property with respect to ''G''. Sometimes this is a [[Causal graph|causal]] DAG. The conditional probability distributions of each variable given its parents in ''G'' are assessed. In many cases, in particular in the case where the variables are discrete, if the joint distribution of ''X'' is the product of these conditional distributions, then ''X'' is a Bayesian network with respect to ''G''.<ref>{{cite book |first=Richard E. |last=Neapolitan | name-list-format = vanc |title=Learning Bayesian networks |url={{google books |plainurl=y |id=OlMZAQAAIAAJ}} |year=2004 |publisher=Prentice Hall |isbn=978-0-13-012534-7 }}</ref>
 
Developing a Bayesian network often begins with creating a DAG ''G'' such that ''X'' satisfies the local Markov property with respect to ''G''. Sometimes this is a [[Causal graph|causal]] DAG. The conditional probability distributions of each variable given its parents in ''G'' are assessed. In many cases, in particular in the case where the variables are discrete, if the joint distribution of ''X'' is the product of these conditional distributions, then ''X'' is a Bayesian network with respect to ''G''.<ref>{{cite book |first=Richard E. |last=Neapolitan | name-list-format = vanc |title=Learning Bayesian networks |url={{google books |plainurl=y |id=OlMZAQAAIAAJ}} |year=2004 |publisher=Prentice Hall |isbn=978-0-13-012534-7 }}</ref>
第823行: 第823行:       −
===Markov blanket===
+
===Markov blanket马尔科夫毯===
    
The [[Markov blanket]] of a node is the set of nodes consisting of its parents, its children, and any other parents of its children. The Markov blanket renders the node independent of the rest of the network; the joint distribution of the variables in the Markov blanket of a node is sufficient knowledge for calculating the distribution of the node. ''X'' is a Bayesian network with respect to ''G'' if every node is conditionally independent of all other nodes in the network, given its [[Markov blanket]].{{sfn|Russell|Norvig|2003|p=499}}
 
The [[Markov blanket]] of a node is the set of nodes consisting of its parents, its children, and any other parents of its children. The Markov blanket renders the node independent of the rest of the network; the joint distribution of the variables in the Markov blanket of a node is sufficient knowledge for calculating the distribution of the node. ''X'' is a Bayesian network with respect to ''G'' if every node is conditionally independent of all other nodes in the network, given its [[Markov blanket]].{{sfn|Russell|Norvig|2003|p=499}}
第833行: 第833行:       −
===={{anchor|d-separation}}''d''-separation====
+
===={{anchor|d-separation}}''d''-separation{{|d-分离}“d”-分离====
      第893行: 第893行:       −
===Causal networks===
+
===Causal networks因果网络===
    
Although Bayesian networks are often used to represent [[causality|causal]] relationships, this need not be the case: a directed edge from ''u'' to ''v'' does not require that ''X<sub>v</sub>'' be causally dependent on ''X<sub>u</sub>''. This is demonstrated by the fact that Bayesian networks on the graphs:
 
Although Bayesian networks are often used to represent [[causality|causal]] relationships, this need not be the case: a directed edge from ''u'' to ''v'' does not require that ''X<sub>v</sub>'' be causally dependent on ''X<sub>u</sub>''. This is demonstrated by the fact that Bayesian networks on the graphs:
第927行: 第927行:       −
==Inference complexity and approximation algorithms==
+
==Inference complexity and approximation algorithms推理复杂度与近似算法==
    
In 1990, while working at Stanford University on large bioinformatic applications, Cooper proved that exact inference in Bayesian networks is [[NP-hard]].<ref>
 
In 1990, while working at Stanford University on large bioinformatic applications, Cooper proved that exact inference in Bayesian networks is [[NP-hard]].<ref>
第969行: 第969行:       −
==Software==
+
==Software软件==
    
<!--Entries in this list should be "notable" with a sourced Wikipedia article. See WP:GNG and WP:WTAF. -->
 
<!--Entries in this list should be "notable" with a sourced Wikipedia article. See WP:GNG and WP:WTAF. -->
第975行: 第975行:  
<!--Entries in this list should be "notable" with a sourced Wikipedia article. See WP:GNG and WP:WTAF. -->
 
<!--Entries in this list should be "notable" with a sourced Wikipedia article. See WP:GNG and WP:WTAF. -->
   −
! ——这个列表中的条目应该与来源于维基百科的文章一起“值得注意”。参见 WP: GNG 和 WP: WTAF。-->
+
! ——这个列表中的条目与来源于维基百科的文章同样“值得注意”。参见 WP: GNG 和 WP: WTAF。-->
    
Notable software for Bayesian networks include:
 
Notable software for Bayesian networks include:
第997行: 第997行:       −
==History==
+
==History历史==
    
The term Bayesian network was coined by [[Judea Pearl]] in 1985 to emphasize:<ref>{{cite conference |last=Pearl |first=J. | name-list-format = vanc  |authorlink=Judea Pearl |year=1985 |title=Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning |conference=Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA
 
The term Bayesian network was coined by [[Judea Pearl]] in 1985 to emphasize:<ref>{{cite conference |last=Pearl |first=J. | name-list-format = vanc  |authorlink=Judea Pearl |year=1985 |title=Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning |conference=Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA
第1,029行: 第1,029行:       −
== See also ==
+
== See also又及 ==
    
{{Portal|Mathematics}}
 
{{Portal|Mathematics}}
第1,097行: 第1,097行:       −
== References ==
+
== References 参考文献==
    
{{Refbegin}}
 
{{Refbegin}}
第1,163行: 第1,163行:       −
== Further reading ==
+
== Further reading 延伸阅读==
    
* {{cite book | title = Bayesian Networks and BayesiaLab – A practical introduction for researchers|url={{google books |plainurl=y |id=etXXsgEACAAJ}} | first1 = Stefan | last1 = Conrady | first2 = Lionel | last2 = Jouffe | name-list-format = vanc | isbn = 978-0-9965333-0-0 | publisher = Bayesian USA | location = Franklin, Tennessee |date=2015-07-01 }}
 
* {{cite book | title = Bayesian Networks and BayesiaLab – A practical introduction for researchers|url={{google books |plainurl=y |id=etXXsgEACAAJ}} | first1 = Stefan | last1 = Conrady | first2 = Lionel | last2 = Jouffe | name-list-format = vanc | isbn = 978-0-9965333-0-0 | publisher = Bayesian USA | location = Franklin, Tennessee |date=2015-07-01 }}
第1,175行: 第1,175行:       −
== External links ==
+
== External links 外部链接==
     
561

个编辑