更改

跳到导航 跳到搜索
添加488字节 、 2024年9月18日 (星期三)
无编辑摘要
第222行: 第222行:     
==Vector Form of EI in Markov Chains==
 
==Vector Form of EI in Markov Chains==
We can also represent the transition probability matrix P as a concatenation of N row vectors, i.e.:  
+
 
 +
We can also represent the [[Transitional Probability Matrix]] [math]P[/math] as a concatenation of [math]N[/math] row vectors, i.e.:  
    
<math>
 
<math>
 
P=(P_1,P_2,\cdots,P_N)^T
 
P=(P_1,P_2,\cdots,P_N)^T
 
</math>
 
</math>
 +
其中,[math]P_i[/math]矩阵[math]P[/math]的第[math]i[/math]个行向量,且满足条件概率的归一化条件:[math]||P_i||_1=1[/math],这里的[math]||\cdot||_1[/math]表示向量的1范数。那么EI可以写成如下的形式:
 +
Where [math]P_i[/math] is the [math]i[/math]-th row vector of matrix [math]P[/math], and it satisfies the normalization condition for conditional probabilities: [math]||P_i||_1=1[/math], where [math]||\cdot||_1[/math] denotes the 1-norm of a vector. Then, EI can be written as follows:
   −
Where Pi​ is the i-th row vector of matrix P, and it satisfies the normalization condition for conditional probabilities: ∣∣Pi​∣∣1​=1, where ∣∣⋅∣∣1​ denotes the L1​-norm of a vector. Then, EI can be written as follows:{{NumBlk|:|
+
{{NumBlk|:|
 
<math>
 
<math>
 
\begin{aligned}
 
\begin{aligned}
第238行: 第241行:  
|{{EquationRef|2}}}}
 
|{{EquationRef|2}}}}
   −
By averaging the columns of the matrix, we obtain the average transition vector P=∑k=1N​Pk​/N. DKL​ is the KL divergence between two distributions. Therefore, EI is the average KL divergence between each row transition vector Pi​ and the average transition vector P.
+
By averaging the columns of the matrix, we obtain the average transition vector <math>\overline{P}=\sum_{k=1}^N P_k/N</math>. [math]D_{KL}[/math] is the [[KL Divergence]] between two distributions. Therefore, EI is the average [[KL Divergence]] between each row transition vector [math]P_i[/math] and the average transition vector [math]\bar{P}[/math].
   −
For the three state transition matrices listed above, their respective EI values are: 2 bits, 1 bit, and 0 bits. This shows that if more 0s or 1s appear in the transition probability matrix (i.e., if more of the row vectors are one-hot vectors, where one position is 1 and the others are 0), the EI value will be higher. In other words, the more deterministic the jump from one time to the next, the higher the EI value tends to be. However, this observation is not entirely precise, and more exact conclusions are provided in the following sections.
+
For the three [[Transitional Probability Matrices]] listed above, their respective EI values are: 2 bits, 1 bit, and 0 bits. This shows that if more 0s or 1s appear in the [[Transitional Probability Matrix]] (i.e., if more of the row vectors are [[One-hot Vectors]], where one position is 1 and the others are 0), the EI value will be higher. In other words, the more deterministic the jump from one time to the next, the higher the EI value tends to be. However, this observation is not entirely precise, and more exact conclusions are provided in the following sections.
    
==Normalization==
 
==Normalization==
Clearly, the magnitude of EI (Effective Information) is related to the size of the state space, which poses challenges when comparing Markov chains of different scales. To address this issue, we need a causal measure that is as independent of scale effects as possible. Therefore, we normalize EI to derive a metric that is independent of the system size.
+
Clearly, the magnitude of EI (Effective Information) is related to the size of the state space, which poses challenges when comparing [[Markov Chains]] of different scales. To address this issue, we need a [[Causal Measure]] that is as independent of scale effects as possible. Therefore, we normalize EI to derive a metric that is independent of the system size.
   −
According to the work of Erik Hoel and Tononi, the normalization process involves using the entropy under a uniform (i.e., maximum entropy) distribution as the denominator, which is [math]\log N[/math], where [math]N[/math] is the number of states in the state space [math]\mathcal{X}[/math][1]. Thus, the normalized EI becomes: Normalized EI=logNEI​
+
According to the work of [[Erik Hoel]] and [[Tononi]], the normalization process involves using the entropy under a [[Uniform Distribution]] (i.e., [[Maximum Entropy]]) as the denominator - <math>\log N</math>is used as the denominator to normalize EI, where [math]N[/math] is the number of states <ref name=hoel_2013 /> in the state space [math]\mathcal{X}[/math]. Thus, the normalized EI becomes:
    
<math>
 
<math>
第251行: 第254行:  
</math>
 
</math>
   −
This normalized metric is also referred to as ''effectiveness''.
+
This normalized metric is also referred to as '''Effectiveness'''.
    
However, when dealing with continuous state variables, normalizing EI by using the logarithm of the number of states in the state space may not be suitable, as the state number often depends on the dimensionality and the resolution of real numbers.
 
However, when dealing with continuous state variables, normalizing EI by using the logarithm of the number of states in the state space may not be suitable, as the state number often depends on the dimensionality and the resolution of real numbers.
第257行: 第260行:  
==Determinism and Degeneracy==
 
==Determinism and Degeneracy==
 
===Decomposition of EI===
 
===Decomposition of EI===
From Equation (1), we see that EI can actually be decomposed into two terms:
+
From Equation {{EquationNote|1}}, we see that EI can actually be decomposed into two terms:
 
  −
EI=−⟨H(Pi​)⟩+H(Pˉ)
      
<math>
 
<math>
第269行: 第270行:  
Similarly, in the context of Markov chains, EI can be decomposed as:
 
Similarly, in the context of Markov chains, EI can be decomposed as:
   −
EI=−⟨H(Pi​)⟩+H(Pˉ){{NumBlk|:|
+
{{NumBlk|:|
 
<math>
 
<math>
 
\begin{aligned}
 
\begin{aligned}
第276行: 第277行:  
\end{aligned}
 
\end{aligned}
 
</math>
 
</math>
|{{EquationRef|tow_terms}}}}Where the first term, [math] -\langle H(P_i) \rangle = \frac{1}{N}\sum_{i=1}^N H(P_i) [/math], represents the negative average entropy of each row vector [math]P_i[/math], which measures the ''determinism'' of the Markov transition matrix.  
+
|{{EquationRef|tow_terms}}}}
 +
 
 +
Where the first term, [math]-\langle H(P_i)\rangle\equiv \frac{1}{N}\sum_{i=1}^N H(P_i)[/math], represents the negative average entropy of each row vector [math]P_i[/math], which measures the '''Determinism''' of the Markov transition matrix.
 +
 
 +
The second term, [math]H(\bar{P})[/math] is the entropy of the average row vector, where [math]\bar{P}\equiv \frac{1}{N}\sum_{i=1}^N P_i [/math] is the average row vector of all N row vectors, and it measures the '''Non-degeneracy''' of the Markov transition matrix.
   −
The second term, [math] H(\bar{P}) [/math], is the entropy of the average row vector, where [math]\bar{P} = \frac{1}{N}\sum_{i=1}^N P_i [/math], and it measures the ''non-degeneracy'' of the Markov transition matrix.
   
===Determinism and Degeneracy===
 
===Determinism and Degeneracy===
 
In the above definition, the determinism and non-degeneracy terms are negative. To prevent this, we redefine the determinism of a Markov chain transition matrix [math]P[/math] as:
 
In the above definition, the determinism and non-degeneracy terms are negative. To prevent this, we redefine the determinism of a Markov chain transition matrix [math]P[/math] as:
2,359

个编辑

导航菜单