更改

Effective Information (查看源代码)

2024年9月18日 (三) 15:58的版本

添加707字节、昨天15:58

无编辑摘要

第339行：第339行：

|}

−

* The first transition probability matrix is a permutation matrix, which is invertible. It has the highest determinism and no degeneracy, leading to the maximum EI.

+

The first transition probability matrix is a permutation matrix, which is invertible. It has the highest determinism and no degeneracy, leading to the maximum EI. The second matrix has the first three states transitioning to one another with equal probability (1/3), resulting in the lowest determinism but non-degeneracy, with EI being 0.81. The third matrix is deterministic but since three of the states transition to the first state, it's impossible to infer from state 1 which previous state led to it. Therefore, it has high degeneracy, and its EI is also 0.81, the same as the second.

−

* The second matrix has the first three states transitioning to one another with equal probability (1/3), resulting in the lowest determinism but non-degeneracy, with EI being 0.81.

−

* The third matrix is deterministic but since three of the states transition to the first state, it's impossible to infer from state 1 which previous state led to it. Therefore, it has high degeneracy, and its EI is also 0.81, the same as the second.

===Normalized Determinism and Degeneracy===

−

In Erik Hoel's original paper, determinism and degeneracy are presented in a normalized form by dividing them by a scale-dependent quantity. To distinguish between the two, we refer to the normalized quantities as the ~~''determinism coefficient''~~ and the ~~''degeneracy coefficient''~~.

+

In [[Erik Hoel]]'s original paper <ref name=hoel_2013 />, determinism and degeneracy are presented in a normalized form by dividing them by a scale-dependent quantity. To distinguish between the two, we refer to the normalized quantities as the Determinism Coefficient and the Degeneracy Coefficient.

−

Specifically, ~~Eric Hall~~ et al. decomposed the normalized valid information, i.e., ~~Eve~~, corresponding to the determinism coefficient and the degeneracy coefficient.

+

Specifically, [[Erik Hoel]] et al. decomposed the normalized valid information, i.e., Eff, corresponding to the determinism coefficient and the degeneracy coefficient.

<math>

第364行：第362行：

==Properties of the EI Function==

−

From Equation 2, we can see that in the transition probability matrix P, EI is a function of each element, representing the conditional probabilities of transitioning from one state to another. Thus, a natural question arises: What mathematical properties does this function have? Does it have extreme points, and if so, where are they? Is it convex? What are its maximum and minimum values?

+

From Equation {{EquationNote|2}}, we can see that in the transition probability matrix P, EI is a function of each element, representing the conditional probabilities of transitioning from one state to another. Thus, a natural question arises: What mathematical properties does this function have? Does it have extreme points, and if so, where are they? Is it convex? What are its maximum and minimum values?

===Domain ===

−

In the case of Markov chains with discrete states and discrete time, the domain of EI is clearly the transition probability matrix P. P is a matrix composed of ~~N×N~~ elements, each ~~representing a probability value pij∈~~[0,1]. Additionally, each row must satisfy the normalization condition, meaning for any ~~∀i∈~~[1,N]~~, the sum of the row's probabilities equals~~:{{NumBlk|:|

+

In the case of Markov chains with discrete states and discrete time, the domain of EI is clearly the transition probability matrix P. P is a matrix composed of [math]N\times N[/math] elements, each [math]p_{ij}\in[0,1][/math] representing a probability value. Additionally, each row must satisfy the normalization condition, meaning for any [math]\forall i\in[1,N][/math]:

+

{{NumBlk|:|

<math>

||P_i||_1=\sum_{j=1}^N p_{ij}=1

</math>

−

|{{EquationRef|3}}}}Thus, the domain of EI, which is the possible space for P, is not the entire real space ~~RN2~~. Due to the normalization condition, it is a subspace of the ~~N×N-~~dimensional real space. How can we express this subspace?

+

|{{EquationRef|3}}}}

+

Thus, the domain of EI, which is the possible space for P, is not the entire real space [math]\mathcal{R}^{N^2}[/math] of [math]N\times N[/math] dimensions. Due to the normalization condition {{EquationNote|3}}, it is a subspace of the [math]N\times N[/math] dimensional real space. How can we express this subspace?

−

For any row vector ~~Pi~~, its value range is a hyper simplex in N-dimensional real space. For example, when N=2, this space is a line segment: pi,1+pi,2=1. When N=3, this space is a plane in three-dimensional space: pi,1+pi,2+pi,3=1. These two spaces are illustrated below:

+

For any row vector [math]P_i[/math], its value range is a hyper simplex in N-dimensional real space. For example, when [math]N=2[/math], this space is a line segment: [math]p_{i,1}+p_{i,2}=1, \forall i\in\{1,2\}[/math]. When [math]N=3[/math], this space is a plane in three-dimensional space: [math]p_{i,1}+p_{i,2}+p_{i,3}=1, \forall i\in\{1,2,3\}[/math]. These two spaces are illustrated below:

−

[[文件:P1+p2=1.png|301x301像素|替代=~~|链接=https://wiki.swarma.org/index.php/%E6%96%87%E4%BB%B6:P1+p2=1.png~~]][[文件:P1+p2+p3=1.png|380x380像素|替代=~~|链接=https://wiki.swarma.org/index.php/%E6%96%87%E4%BB%B6:P1+p2+p3=1.png~~]]

+

[[文件:P1+p2=1.png|301x301像素|替代=]][[文件:P1+p2+p3=1.png|380x380像素|替代=]]

−

In the general case, we define the value range space of the row vector ~~Pi~~ in N-dimensional space as Δ={~~pj∣∑j~~=~~1Npj~~=1,~~pj∈~~[0,1]}. The Cartesian product of N such spaces forms the domain of EI:

+

In the general case, we define the value range space of the row vector [math]P_i[/math] in N-dimensional space as [math]\Delta=\{p_{j}|\sum_{j=1}^Np_{j}=1,p_{j}\in[0,1]\}[/math]. The Cartesian product of N such spaces forms the domain of EI:

<math>

Dom(EI)=\Delta\times \Delta\cdots\times\Delta=\Delta^N

</math>

+

===First Derivative and Extreme Points===

−

Taking the first derivative of Equation 2 with respect to ~~pij~~, and considering the normalization condition ∑j=~~1Npij~~=1, we get:{{NumBlk|:|

+

Taking the first derivative of Equation {{EquationNote|2}} with respect to [math]p_{ij}[/math], and considering the normalization condition [math]\sum_{j=1}^Np_{ij}=1[/math], we get:

+

{{NumBlk|:|

<math>

\frac{\partial EI}{\partial p_{ij}}=\log\left(\frac{p_{ij}}{p_{iN}}\right)-\log\left(\frac{\bar{p}_{\cdot j}}{\bar{p}_{\cdot N}}\right),

第388行：第393行：

|{{EquationRef|3}}}}

−

Here, ~~pij~~ denotes the conditional probability of transitioning from state i to state j in matrix P. Since each row of P is subject to the normalization constraint, the EI function has N(~~N−1~~) degrees of freedom. We can select ~~1≤i≤N and 1≤j≤N−1~~, with ~~piN~~ representing the conditional probability in the i-th row and N-th column. ~~pˉ⋅j and pˉ⋅N~~ represent the average conditional probabilities of the j-th and N-th columns, respectively.

+

Here, <math>p_{ij}</math> denotes the conditional probability of transitioning from state i to state j in matrix P. Since each row of P is subject to the normalization constraint {{EquationNote|2}}, the EI function has <math>N(N-1)</math> degrees of freedom. We can select <math>1\leq i\leq N, 1\leq j\leq N-1</math>, with <math>p_{iN}</math> representing the conditional probability in the i-th row and N-th column. <math>\bar{p}_{\cdot j}, \bar{p}_{\cdot N}</math> represent the average conditional probabilities of the j-th and N-th columns, respectively.

−

It is easy to see that this derivative is defined only when, for the selected i,j∈[1,N], the probabilities ~~pij~~,~~piN~~,~~pˉ⋅j~~,~~pˉ⋅N~~ are all greater than 0. Otherwise, if any of these terms are 0, the derivative does not exist.

+

It is easy to see that this derivative is defined only when, for the selected [math]i,j\in[1,N][/math], the probabilities [math]p_{ij},p_{iN},\bar{p}_{\cdot j}\equiv \frac{\sum_{k=1}^Np_{kj}}{N},\bar{p}_{\cdot N}\equiv \frac{\sum_{k=1}^Np_{kN}}{N}[/math] are all greater than 0. Only if this condition is satisfied, then EI is derivable in the vicinity of [math]p_{ij}[/math]. Otherwise, if any of these terms are 0, the derivative does not exist.

−

Setting Equation 3 to 0 yields the extreme points. For any ~~1≤i≤N~~,~~1≤j≤N−1~~, the following equation holds:

+

Setting Equation {{EquationNote|3}} to 0 yields the extreme points. For any <math>1\leq i\leq N, 1\leq j\leq N-1</math>, the following equation holds:

<math>

第398行：第403行：

</math>

−

It is straightforward to compute that in this case, EI=0, indicating that EI reaches an extreme point. From the second derivative of EI, we can easily determine that this is a minimum point. In other words, EI has many minimum points, as long as all the row vectors of the transition probability matrix are identical. Regardless of the specific distribution of these row vectors, EI will be zero.

+

It is straightforward to compute that in this case, <math>EI=0</math>, indicating that EI reaches an extreme point. From the second derivative of EI, we can easily determine that this is a minimum point. In other words, EI has many minimum points, as long as all the row vectors of the transition probability matrix are identical. Regardless of the specific distribution of these row vectors, EI will be zero.

+

===Second Derivative and Convexity===

−

To explore the convexity of the EI function, we can compute its second derivative<math>\frac{\partial^2 EI}{\partial p_{ij}\partial p_{st}}</math>, where ~~1≤s≤N and 1≤t≤N−1~~. First, we introduce a function symbol δi,j,

+

To explore the convexity of the EI function, we can compute its second derivative <math>\frac{\partial^2 EI}{\partial p_{ij}\partial p_{st}}</math>, where <math>1\leq s \leq N, 1\leq t \leq N-1 </math>. First, we introduce a function symbol <math>\delta_{i,j} </math>,

<math>

第438行：第444行：

</math>

−

Moreover, the second derivative is positive when i=s and negative when ~~i=~~s. Therefore, EI is neither convex nor concave.

+

Moreover, the second derivative is positive when <math>i=s </math> and negative when <math>i\ne s</math>. Therefore, EI is neither convex nor concave.

+

===Minimum Value===

From the first derivative of EI, we discussed the extreme value problem, showing that EI reaches a minimum of 0 when all the row vectors of the transition matrix are identical. Is this minimum value the lowest possible?

−

We know that based on Equation 2, EI is essentially a form of KL divergence, which is always non-negative. Thus:

+

We know that based on Equation {{EquationNote|2}}, EI is essentially a form of KL divergence, which is always non-negative. Thus:

<math>

第448行：第455行：

</math>

−

The equality holds when ~~pij~~=~~pˉ⋅j~~ for all i,j. Hence, 0 is the minimum value of EI, with the minimum point being:

+

The equality holds when [math]p_{ij}=\bar{p}_{\cdot j}[/math] for all [math]i,j[/math]. Hence, 0 is the minimum value of EI, with the minimum point being [math]p_{ij}=\bar{p}_{\cdot j}=\frac{1}{N}\sum_{k=1}^Np_{kj}[/math]:

<math>

相信未来

1,117

个编辑