第108行: |
第108行: |
| | | |
| The introduction of the do-operator makes EI distinct from other information metrics. The key difference is that EI is solely a function of the causal mechanism, which allows it to more precisely capture the essence of causality compared to other metrics like transfer entropy. However, this also means that EI requires knowledge of or access to the causal mechanism, which may be challenging if only observational data is available. | | The introduction of the do-operator makes EI distinct from other information metrics. The key difference is that EI is solely a function of the causal mechanism, which allows it to more precisely capture the essence of causality compared to other metrics like transfer entropy. However, this also means that EI requires knowledge of or access to the causal mechanism, which may be challenging if only observational data is available. |
− | ==为什么干预成均匀分布?== | + | ==Why Intervene to Achieve a Uniform Distribution?== |
− | 在[[Erik Hoel]]的原始定义中,[[do操作]]是将因变量[math]X[/math]干预成了在其定义域[math]\mathcal{X}[/math]上的[[均匀分布]](也就是[[最大熵分布]])<ref name="hoel_2013">{{cite journal|last1=Hoel|first1=Erik P.|last2=Albantakis|first2=L.|last3=Tononi|first3=G.|title=Quantifying causal emergence shows that macro can beat micro|journal=Proceedings of the National Academy of Sciences|volume=110|issue=49|page=19790–19795|year=2013|url=https://doi.org/10.1073/pnas.1314922110}}</ref><ref name="hoel_2017">{{cite journal|author1=Hoel, E.P.|title=When the Map Is Better Than the Territory|journal=Entropy|year=2017|volume=19|page=188|url=https://doi.org/10.3390/e19050188}}</ref>。那么, 为什么要干预成[[均匀分布]]呢?其它分布是否也可以?
| |
| | | |
− | 首先,根据上一小节的论述,[[do操作]]的实质是希望让EI能够更清晰地刻画[[因果机制]][math]f[/math]的性质,因此,需要切断因变量[math]X[/math]与其它变量的联系,并改变其分布,让EI度量与[math]X[/math]的分布无关。
| + | In Erik Hoel's original definition, the ''do'' operation intervenes on the dependent variable X, transforming it into a uniform distribution over its domain X (which is also the maximum entropy distribution). So, why should we intervene to achieve a uniform distribution? Can other distributions be used? |
| | | |
− | 而之所以要把输入变量干预为[[均匀分布]],其实就是要更好地刻画[[因果机制]]的特性。
| + | Firstly, according to the previous section, the essence of the ''do'' operation is to allow the Effective Information (EI) to better describe the nature of the causal mechanism f. Therefore, it is necessary to sever the connection between the dependent variable X and other variables and change its distribution so that the EI metric becomes independent of the distribution of X. |
| | | |
− | 这是因为,当[math]\mathcal{X}[/math]和[math]\mathcal{Y}[/math]都是有限可数集合的时候,因果机制[math]f\equiv Pr(Y=y|X=x)[/math]就成为了一个[math]\#(\mathcal{X})[/math]行[math]\#(\mathcal{Y})[/math]列的矩阵,我们可以展开EI的定义:{{NumBlk|:|
| + | The reason for intervening to achieve a uniform distribution for the input variable is to more accurately characterize the properties of the causal mechanism. |
| + | |
| + | This is because, when both X and Y are finite, countable sets, the causal mechanism f≡Pr(Y=y∣X=x) becomes a matrix with #(X) rows and #(Y) columns. We can expand the definition of EI:{{NumBlk|:| |
| <math> | | <math> |
| \begin{aligned} | | \begin{aligned} |
第124行: |
第125行: |
| \end{aligned} | | \end{aligned} |
| </math> | | </math> |
− | |{{EquationRef|1}}}}不难看出,最后得到的等式告诉我们,EI实际上由两项构成,第一项是因果机制矩阵每一行的负熵的平均值,第二项则是变量[math]Y[/math]的熵。 在第一项中,[math]X[/math]的概率分布[math]Pr(X=x)[/math]实际上起到了对每一行的熵求平均时候的权重的作用。只有当我们将该权重取为同样的数值的时候,才能够平等地对待因果机制矩阵中的每一个行,这时就恰好是将[math]X[/math]干预成均匀分布的时候。 | + | |{{EquationRef|1}}}} |
| + | |
| + | |
| + | It is easy to see that the final equation tells us that EI is actually composed of two terms: the first term is the average of the negative entropy of each row of the causal mechanism matrix, and the second term is the entropy of the variable Y. In the first term, the probability distribution Pr(X=x) of X acts as the weight when averaging the entropy of each row. Only when we set this weight to be the same value (i.e., intervene to make X uniformly distributed) can we treat each row of the causal mechanism matrix equally. |
| | | |
− | 如果不是均匀分布,也就意味着某些行的熵就会被乘以一个较大的权重,有的行就会被赋予一个较小的权重,这种权重代表了某种“偏见”,因此也就不能做到让EI能够反映因果机制的天然属性了。
| + | If the distribution is not uniform, some rows will be assigned a larger weight, while others will be given a smaller weight. This weight represents a certain "bias," which prevents the EI from reflecting the natural properties of the causal mechanism. |
| | | |
| =马尔科夫链的有效信息= | | =马尔科夫链的有效信息= |
第218行: |
第222行: |
| 显然,EI的大小和状态空间大小有关,这一性质在我们比较不同尺度的[[马尔科夫链]]的时候非常不方便,我们需要一个尽可能不受尺度效应影响的[[因果效应度量]]。因此,我们需要对有效信息EI做一个归一化处理,得到和系统尺寸无关的一个量化指标。 | | 显然,EI的大小和状态空间大小有关,这一性质在我们比较不同尺度的[[马尔科夫链]]的时候非常不方便,我们需要一个尽可能不受尺度效应影响的[[因果效应度量]]。因此,我们需要对有效信息EI做一个归一化处理,得到和系统尺寸无关的一个量化指标。 |
| | | |
− | 根据[[Erik Hoel]]和[[Tononi]]等人的工作,要用[[均匀分布]]即[[最大熵分布]]下的熵值,即<math>\log N</math>来做分母对EI进行归一化,这里的[math]N[/math]为状态空间[math]\mathcal{X}[/math]中的状态的数量<ref name="hoel_2013" />。那么归一化后的EI便等于: | + | 根据[[Erik Hoel]]和[[Tononi]]等人的工作,要用[[均匀分布]]即[[最大熵分布]]下的熵值,即<math>\log N</math>来做分母对EI进行归一化,这里的[math]N[/math]为状态空间[math]\mathcal{X}[/math]中的状态的数量<ref name="hoel_2013">{{cite journal|last1=Hoel|first1=Erik P.|last2=Albantakis|first2=L.|last3=Tononi|first3=G.|title=Quantifying causal emergence shows that macro can beat micro|journal=Proceedings of the National Academy of Sciences|volume=110|issue=49|page=19790–19795|year=2013|url=https://doi.org/10.1073/pnas.1314922110}}</ref>。那么归一化后的EI便等于: |
| | | |
| <math> | | <math> |