第83行: |
第83行: |
| Using the expansion for the joint probability function <math>\Pr(G,S,R)</math> and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example, | | Using the expansion for the joint probability function <math>\Pr(G,S,R)</math> and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example, |
| | | |
− | 展开概率密度函数<math>\Pr(G,S,R)</math> ,并使用图中列出条件概率,我们可以算出分子和分母中的各个项。比如说,
| + | 展开概率函数<math>\Pr(G,S,R)</math> ,并使用图中列出条件概率,我们可以算出分子和分母中的各个项。比如说, |
| | | |
| | | |
第107行: |
第107行: |
| To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function | | To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function |
| | | |
− | 这个模型还回答干预性的问题,比如“现在我们把草弄湿了,那么下雨的可能性有多大? ”答案取决于干预后的'''<font color="#ff8000">联合分布函数 Joint distribution function</font>''' | + | 这个模型还回答干预性的问题,比如“现在我们把草弄湿了,那么下雨的可能性有多大? ”答案取决于干预后的联合分布函数: |
| | | |
| : <math>\Pr(S,R\mid\text{do}(G=T)) = \Pr(S\mid R) \Pr(R)</math> | | : <math>\Pr(S,R\mid\text{do}(G=T)) = \Pr(S\mid R) \Pr(R)</math> |
第116行: |
第116行: |
| obtained by removing the factor <math>\Pr(G\mid S,R)</math> from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action: | | obtained by removing the factor <math>\Pr(G\mid S,R)</math> from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action: |
| | | |
− | 该分布通过从干预前的分布中去除因子<math>\Pr(G\mid S,R)</math>得到,其中do算子强行使 G 的值为真。下雨的可能性不受此干预的影响: | + | 该分布通过从干预前的分布中去除因子<math>\Pr(G\mid S,R)</math>得到,其中do算子强行使 G 的值为真。演算后可知下雨的可能性不受此干预的影响: |
| | | |
| : <math>\Pr(R\mid\text{do}(G=T)) = \Pr(R).</math> | | : <math>\Pr(R\mid\text{do}(G=T)) = \Pr(R).</math> |
第124行: |
第124行: |
| To predict the impact of turning the sprinkler on: | | To predict the impact of turning the sprinkler on: |
| | | |
− | 预测开启洒水装置的影响:
| + | 现在再预测开启洒水装置的影响: |
| | | |
| : <math>\Pr(R,G\mid\text{do}(S=T)) = \Pr(R)\Pr(G\mid R,S=T)</math> | | : <math>\Pr(R,G\mid\text{do}(S=T)) = \Pr(R)\Pr(G\mid R,S=T)</math> |
第133行: |
第133行: |
| with the term <math>\Pr(S=T\mid R)</math> removed, showing that the action affects the grass but not the rain. | | with the term <math>\Pr(S=T\mid R)</math> removed, showing that the action affects the grass but not the rain. |
| | | |
− | 移除<math>\Pr(S=T\mid R)</math> 这个项,表明这种行为影响的是草,而不是雨。 | + | 移除<math>\Pr(S=T\mid R)</math> 这个项表明这种行为影响的是草,而不是雨。 |
| | | |
| | | |
第140行: |
第140行: |
| These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action <math>\text{do}(x)</math> can still be predicted, however, whenever the back-door criterion is satisfied. It states that, if a set Z of nodes can be observed that d-separates (or blocks) all back-door paths from X to Y then | | These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action <math>\text{do}(x)</math> can still be predicted, however, whenever the back-door criterion is satisfied. It states that, if a set Z of nodes can be observed that d-separates (or blocks) all back-door paths from X to Y then |
| | | |
− | 这些预测可能不可行的给予未观测的变量,因为在大多数政策评估问题。但是,只要满足后门准则,仍然可以预测操作 math text { do }(x) / math 的效果。它指出,如果一组 z 节点可以观察到 d-分隔(或阻塞)从 x 到 y 的所有后门路径
| + | 考虑到未观测变量,这些预测可能并不可行,就像大多数策略评估问题一样。但是,只要满足后门准则,仍然可以预测 <math>\text{do}(x)</math>的效果。如果一组观察到的变量''Z''能d-分隔(或阻塞)从 ''X'' 到 ''Y'' 的所有后门路径,则有 |
| | | |
| : <math>\Pr(Y,Z\mid\text{do}(x)) = \frac{\Pr(Y,Z,X=x)}{\Pr(X=x\mid Z)}.</math> | | : <math>\Pr(Y,Z\mid\text{do}(x)) = \frac{\Pr(Y,Z,X=x)}{\Pr(X=x\mid Z)}.</math> |
第147行: |
第147行: |
| A back-door path is one that ends with an arrow into ''X''. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set ''Z'' = ''R'' is admissible for predicting the effect of ''S'' = ''T'' on ''G'', because ''R'' ''d''-separates the (only) back-door path ''S'' ← ''R'' → ''G''. However, if ''S'' is not observed, no other set ''d''-separates this path and the effect of turning the sprinkler on (''S'' = ''T'') on the grass (''G'') cannot be predicted from passive observations. In that case ''P''(''G'' | do(''S'' = ''T'')) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between ''S'' and ''G'' is due to a causal connection or is spurious | | A back-door path is one that ends with an arrow into ''X''. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set ''Z'' = ''R'' is admissible for predicting the effect of ''S'' = ''T'' on ''G'', because ''R'' ''d''-separates the (only) back-door path ''S'' ← ''R'' → ''G''. However, if ''S'' is not observed, no other set ''d''-separates this path and the effect of turning the sprinkler on (''S'' = ''T'') on the grass (''G'') cannot be predicted from passive observations. In that case ''P''(''G'' | do(''S'' = ''T'')) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between ''S'' and ''G'' is due to a causal connection or is spurious |
| | | |
− | A back-door path is one that ends with an arrow into X. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z = R is admissible for predicting the effect of S = T on G, because R d-separates the (only) back-door path S ← R → G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S = T) on the grass (G) cannot be predicted from passive observations. In that case P(G | do(S = T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious | + | A back-door path is one that ends with an arrow into ''X''. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z = R is admissible for predicting the effect of S = T on G, because R d-separates the (only) back-door path S ← R → G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S = T) on the grass (G) cannot be predicted from passive observations. In that case P(G | do(S = T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious |
| | | |
− | 后门路径是以 x 的箭头结束的路径。满足后门标准的集合称为“充分”或“可接受”例如,集合 z r 可以用来预测 s t 对 g 的影响,因为 rd- 分离了(仅)后门路径 s ← r → g。但是,如果 s 没有被观测到,没有其他集合 d-分离这条路径,并且不能从被动观测预测草(g)上喷头打开的影响。在这种情况下,p (g | do (s t))不被“识别”。这反映了这样一个事实,即缺乏干预性数据,所观察到的 s 和 g 之间的依赖性是由于一个因果关系或是伪造的
| + | 后门路径是一条箭头指向''X''的路径。满足后门标准的(观测变量)集合称为“充分的”或“有效的”。例如,集合 ''Z'' = ''R'' 能有效地预测 ''S'' = ''T'' 对 ''G'' 的影响,因为 ''R'' ''d''-分隔了(仅有的)后门路径 ''S'' ← ''R'' → ''G''。但是,如果 ''S'' 没有被观测到,没有其他观测变量集合来 ''d''-分隔这条路径,那就不能从观测数据中预测到“喷头被打开”(''S'' = ''T'')对于草地''G''的影响。在这种情况下,''P''(''G'' | do(''S'' = ''T''))就没有被“识别”。这反映了一个事实:在缺乏干预性数据的情况下无法确认观察到的 ''S'' 和 ''G'' 之间的依赖关系是不是一种因果关系(可能由共同原因引起的强相关,比如辛普森悖论)。 |
| | | |
| (apparent dependence arising from a common cause, ''R''). (see [[Simpson's paradox]]) | | (apparent dependence arising from a common cause, ''R''). (see [[Simpson's paradox]]) |
第163行: |
第163行: |
| To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus" and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data. | | To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus" and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data. |
| | | |
− | 为了确定一个因果关系是否可以从一个任意的含有未观测变量的'''<font color="#ff8000"> 贝氏网络Bayesian network</font>'''中识别出来,我们可以使用“ do-calculus”的三个规则来检验是否所有的 do 项都可以从这个关系的表达式中去掉,从而确认所需的量是可以从频率数据中估计出来的。
| + | 为了确定一个因果关系是否可以从一个含有未观测变量的贝叶斯网络中识别出来,我们可以使用“ do-演算”的三个规则来检验是否所有的 do 项都可以从这个关系的表达式中去掉,从而确认所需的量可以从数据中估计出来。 |
| | | |
| | | |
第171行: |
第171行: |
| Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for <math>2^{10} = 1024</math> values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most <math>10\cdot2^3 = 80</math> values. | | Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for <math>2^{10} = 1024</math> values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most <math>10\cdot2^3 = 80</math> values. |
| | | |
− | 如果依赖关系在联合分布中是稀疏的,那么在详尽的概率表上使用贝氏网路分布可以节省相当大的内存。例如,将10个二值变量的条件概率存储为一个表的简单方法需要存储 math 2 ^ {10}1024 / math 值。如果没有变量的局部分布依赖于3个以上的父变量,那么'''<font color="#ff8000"> 贝氏网络Bayesian network</font>'''表示最多只存储 math 10 cdot2 ^ 380 / math 值。
| + | 如果依赖关系在联合分布中是稀疏的(变量间依赖较少,即对应的图模型里的边较少),那么相对于存储一张完整的概率表,使用贝叶斯网络可以节省相当多的内存。例如,将10个二值变量的条件概率存储为一个表的,需要存储<math>2^{10} = 1024</math> 个值。而在每个变量最多依赖3个父变量的情况下,使用贝叶斯网络表示最多只存储<math>10\cdot2^3 = 80</math>个值。 |
| | | |
| | | |
第179行: |
第179行: |
| One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions. | | One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions. |
| | | |
− | '''<font color="#ff8000"> 贝氏网络Bayesian networks</font>'''的一个优点是它比'''<font color="#ff8000"> 完全联合分布Complete joint distributions</font>'''更易于人类直观地理解(一组稀疏的)直接依赖关系和局部分布。
| + | 相比于完全版的联合概率分布,理解(一组稀疏的)直接的变量间依赖关系和局部的概率分布对于人类来说要更加直观易懂。这正是贝叶斯网络的一个优点。 |
| | | |
| ==Inference and learning推论与学习== | | ==Inference and learning推论与学习== |