第19行: |
第19行: |
| In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons, commonly called bits) obtained about one random variable through observing the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable. | | In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons, commonly called bits) obtained about one random variable through observing the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable. |
| | | |
− | 在概率论和信息论中,两个随机变量的互信息是两个变量相互依赖的度量。更具体地说,它量化了通过观察其他随机变量获得的关于一个随机变量的“信息量”(以单位为单位,如 shannons,通常称为位)。互信息的概念与随机变量的熵的概念有着错综复杂的联系,熵是信息论中的一个基本概念,它量化了一个随机变量中预期的“信息量”。
| + | 在概率论和信息论中,两个随机变量的'''互信息 Mutual Information,MI'''是两个变量之间相互依赖性的度量。更具体地说,它量化了通过观察另一个随机变量而获得的关于一个随机变量的“信息量”(单位如''香农 shannons'',通常称为比特)。互信息的概念与随机变量的熵有着错综复杂的联系,熵是信息理论中的一个基本概念,它量化了随机变量中所包含的预期“信息量”。 |
| | | |
| | | |
第29行: |
第29行: |
| Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how different the joint distribution of the pair <math>(X,Y)</math> is to the product of the marginal distributions of <math>X</math> and <math>Y</math>. MI is the expected value of the pointwise mutual information (PMI). | | Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how different the joint distribution of the pair <math>(X,Y)</math> is to the product of the marginal distributions of <math>X</math> and <math>Y</math>. MI is the expected value of the pointwise mutual information (PMI). |
| | | |
− | 不仅限于实值随机变量和相关系数之类的线性依赖,MI 更为普遍,它决定了一对 math (x,y) / math 的联合分布与 math x / math 和 math y / math 的边际分布的乘积有多大的不同。Mi 是点间互信息指数的预期值。
| + | 不仅限于实值随机变量和相关系数之类的线性依赖,互信息更为普遍,它决定了一对变量<math>(X,Y)</math>的联合分布以及 <math>X</math> 和 <math>Y</math> 的边际分布之积有多大的不同。互信息 是'''点态互信息指数 PMI'''的期望值。 |
| | | |
| | | |
第39行: |
第39行: |
| Mutual Information is also known as information gain. | | Mutual Information is also known as information gain. |
| | | |
− | 互信息也称为信息增益。
| + | 互信息也称为'''信息增益 Information gain'''。 |
| | | |
| | | |
第45行: |
第45行: |
| | | |
| | | |
− | == Definition == | + | == 定义 Definition == |
− | | |
− | == Definition ==
| |
− | | |
− | 定义
| |
| | | |
| Let <math>(X,Y)</math> be a pair of random variables with values over the space <math>\mathcal{X}\times\mathcal{Y}</math>. If their joint distribution is <math>P_{(X,Y)}</math> and the marginal distributions are <math>P_X</math> and <math>P_Y</math>, the mutual information is defined as | | Let <math>(X,Y)</math> be a pair of random variables with values over the space <math>\mathcal{X}\times\mathcal{Y}</math>. If their joint distribution is <math>P_{(X,Y)}</math> and the marginal distributions are <math>P_X</math> and <math>P_Y</math>, the mutual information is defined as |
第357行: |
第353行: |
| | | |
| | | |
− | == Motivation == | + | == 动机 Motivation == |
− | | |
− | == Motivation ==
| |
− | | |
− | 动机
| |
| | | |
| Intuitively, mutual information measures the information that <math>X</math> and <math>Y</math> share: It measures how much knowing one of these variables reduces uncertainty about the other. For example, if <math>X</math> and <math>Y</math> are independent, then knowing <math>X</math> does not give any information about <math>Y</math> and vice versa, so their mutual information is zero. At the other extreme, if <math>X</math> is a deterministic function of <math>Y</math> and <math>Y</math> is a deterministic function of <math>X</math> then all information conveyed by <math>X</math> is shared with <math>Y</math>: knowing <math>X</math> determines the value of <math>Y</math> and vice versa. As a result, in this case the mutual information is the same as the uncertainty contained in <math>Y</math> (or <math>X</math>) alone, namely the [[information entropy|entropy]] of <math>Y</math> (or <math>X</math>). Moreover, this mutual information is the same as the entropy of <math>X</math> and as the entropy of <math>Y</math>. (A very special case of this is when <math>X</math> and <math>Y</math> are the same random variable.) | | Intuitively, mutual information measures the information that <math>X</math> and <math>Y</math> share: It measures how much knowing one of these variables reduces uncertainty about the other. For example, if <math>X</math> and <math>Y</math> are independent, then knowing <math>X</math> does not give any information about <math>Y</math> and vice versa, so their mutual information is zero. At the other extreme, if <math>X</math> is a deterministic function of <math>Y</math> and <math>Y</math> is a deterministic function of <math>X</math> then all information conveyed by <math>X</math> is shared with <math>Y</math>: knowing <math>X</math> determines the value of <math>Y</math> and vice versa. As a result, in this case the mutual information is the same as the uncertainty contained in <math>Y</math> (or <math>X</math>) alone, namely the [[information entropy|entropy]] of <math>Y</math> (or <math>X</math>). Moreover, this mutual information is the same as the entropy of <math>X</math> and as the entropy of <math>Y</math>. (A very special case of this is when <math>X</math> and <math>Y</math> are the same random variable.) |
第403行: |
第395行: |
| | | |
| | | |
− | == Relation to other quantities == | + | == 与其他量的关系 Relation to other quantities == |
| | | |
− | == Relation to other quantities ==
| + | === 非负性 Nonnegativity === |
− | | |
− | 与其他量的关系
| |
− | | |
− | === Nonnegativity ===
| |
− | | |
− | === Nonnegativity === | |
− | | |
− | 非消极性
| |
| | | |
| Using [[Jensen's inequality]] on the definition of mutual information we can show that <math>\operatorname{I}(X;Y)</math> is non-negative, i.e.<ref name=cover1991 />{{rp|28}} | | Using [[Jensen's inequality]] on the definition of mutual information we can show that <math>\operatorname{I}(X;Y)</math> is non-negative, i.e.<ref name=cover1991 />{{rp|28}} |
第431行: |
第415行: |
| | | |
| | | |
− | === Symmetry=== | + | === 对称性 Symmetry=== |
− | | |
− | === Symmetry===
| |
− | | |
− | 对称性
| |
| | | |
| :<math>\operatorname{I}(X;Y) = \operatorname{I}(Y;X)</math> | | :<math>\operatorname{I}(X;Y) = \operatorname{I}(Y;X)</math> |
第447行: |
第427行: |
| | | |
| | | |
− | === Relation to conditional and joint entropy === | + | === 条件熵与联合熵的关系 Relation to conditional and joint entropy === |
− | | |
− | === Relation to conditional and joint entropy ===
| |
− | | |
− | 条件熵与联合熵的关系
| |
| | | |
| Mutual information can be equivalently expressed as: | | Mutual information can be equivalently expressed as: |
第893行: |
第869行: |
| | | |
| | | |
− | === Bayesian estimation of mutual information === | + | === 互信息的贝叶斯估计 Bayesian estimation of mutual information === |
− | | |
− | === Bayesian estimation of mutual information ===
| |
− | | |
− | 互信息的贝叶斯估计
| |
− | | |
− | | |
− | | |
− | | |
| | | |
| It is well-understood how to do Bayesian estimation of the mutual information | | It is well-understood how to do Bayesian estimation of the mutual information |
第965行: |
第933行: |
| | | |
| | | |
− | === Independence assumptions === | + | === 独立性假设 Independence assumptions === |
− | | |
− | === Independence assumptions ===
| |
− | | |
− | 独立性假设
| |
| | | |
| The Kullback-Leibler divergence formulation of the mutual information is predicated on that one is interested in comparing <math>p(x,y)</math> to the fully factorized [[outer product]] <math>p(x) \cdot p(y)</math>. In many problems, such as [[non-negative matrix factorization]], one is interested in less extreme factorizations; specifically, one wishes to compare <math>p(x,y)</math> to a low-rank matrix approximation in some unknown variable <math>w</math>; that is, to what degree one might have | | The Kullback-Leibler divergence formulation of the mutual information is predicated on that one is interested in comparing <math>p(x,y)</math> to the fully factorized [[outer product]] <math>p(x) \cdot p(y)</math>. In many problems, such as [[non-negative matrix factorization]], one is interested in less extreme factorizations; specifically, one wishes to compare <math>p(x,y)</math> to a low-rank matrix approximation in some unknown variable <math>w</math>; that is, to what degree one might have |
第1,023行: |
第987行: |
| | | |
| | | |
− | == Variations == | + | == 变化 Variations == |
− | | |
− | == Variations ==
| |
− | | |
− | 变化
| |
| | | |
| Several variations on mutual information have been proposed to suit various needs. Among these are normalized variants and generalizations to more than two variables. | | Several variations on mutual information have been proposed to suit various needs. Among these are normalized variants and generalizations to more than two variables. |
第1,039行: |
第999行: |
| | | |
| | | |
− | === Metric === | + | === 公制 Metric === |
− | | |
− | === Metric ===
| |
− | | |
− | 公制
| |
| | | |
| Many applications require a [[metric (mathematics)|metric]], that is, a distance measure between pairs of points. The quantity | | Many applications require a [[metric (mathematics)|metric]], that is, a distance measure between pairs of points. The quantity |
第1,201行: |
第1,157行: |
| | | |
| | | |
− | === Conditional mutual information === | + | === 条件互信息 Conditional mutual information === |
− | | |
− | === Conditional mutual information ===
| |
− | | |
− | 条件互信息
| |
| | | |
| {{Main|Conditional mutual information}} | | {{Main|Conditional mutual information}} |
第1,453行: |
第1,405行: |
| | | |
| | | |
− | === Multivariate mutual information === | + | === 多元互信息 Multivariate mutual information === |
− | | |
− | === Multivariate mutual information ===
| |
− | | |
− | 多元互信息
| |
| | | |
| {{Main|Multivariate mutual information}} | | {{Main|Multivariate mutual information}} |
第1,569行: |
第1,517行: |
| | | |
| | | |
− | ====Multivariate statistical independence ==== | + | ==== 多元统计独立性 Multivariate statistical independence ==== |
− | | |
− | ====Multivariate statistical independence ====
| |
− | | |
− | 多元统计独立性
| |
| | | |
| The multivariate mutual-information functions generalize the pairwise independence case that states that <math>X_1,X_2</math> if and only if <math>I(X_1;X_2)=0</math>, to arbitrary numerous variable. n variables are mutually independent if and only if the <math>2^n-n-1</math> mutual information functions vanish <math>I(X_1;...;X_k)=0</math> with <math>n \ge k \ge 2</math> (theorem 2 <ref name=e21090869/>). In this sense, the <math>I(X_1;...;X_k)=0</math> can be used as a refined statistical independence criterion. | | The multivariate mutual-information functions generalize the pairwise independence case that states that <math>X_1,X_2</math> if and only if <math>I(X_1;X_2)=0</math>, to arbitrary numerous variable. n variables are mutually independent if and only if the <math>2^n-n-1</math> mutual information functions vanish <math>I(X_1;...;X_k)=0</math> with <math>n \ge k \ge 2</math> (theorem 2 <ref name=e21090869/>). In this sense, the <math>I(X_1;...;X_k)=0</math> can be used as a refined statistical independence criterion. |
第1,585行: |
第1,529行: |
| | | |
| | | |
− | ==== Applications ==== | + | ==== 申请 Applications ==== |
− | | |
− | ==== Applications ====
| |
− | | |
− | 申请
| |
| | | |
| For 3 variables, Brenner et al. applied multivariate mutual information to neural coding and called its negativity "synergy" <ref>{{cite journal | last1 = Brenner | first1 = N. | last2 = Strong | first2 = S. | last3 = Koberle | first3 = R. | last4 = Bialek | first4 = W. | year = 2000 | title = Synergy in a Neural Code | doi = 10.1162/089976600300015259 | pmid = 10935917 | journal = Neural Comput | volume = 12 | issue = 7 | pages = 1531–1552 }}</ref> and Watkinson et al. applied it to genetic expression <ref>{{cite journal | last1 = Watkinson | first1 = J. | last2 = Liang | first2 = K. | last3 = Wang | first3 = X. | last4 = Zheng | first4 = T.| last5 = Anastassiou | first5 = D. | year = 2009 | title = Inference of Regulatory Gene Interactions from Expression Data Using Three-Way Mutual Information | doi = 10.1111/j.1749-6632.2008.03757.x | pmid = 19348651 | journal = Chall. Syst. Biol. Ann. N. Y. Acad. Sci. | volume = 1158 | issue = 1 | pages = 302–313 | bibcode = 2009NYASA1158..302W | url = https://semanticscholar.org/paper/cb09223a34b08e6dcbf696385d9ab76fd9f37aa4 }}</ref>. For arbitrary k variables, Tapia et al. applied multivariate mutual information to gene expression <ref name=s41598>{{cite journal|last1=Tapia|first1=M.|last2=Baudot|first2=P.|last3=Formizano-Treziny|first3=C.|last4=Dufour|first4=M.|last5=Goaillard|first5=J.M.|year=2018|title=Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons|doi= 10.1038/s41598-018-31765-z|pmid=30206240|pmc=6134142|journal=Sci. Rep.|volume=8|issue=1|pages=13637|bibcode=2018NatSR...813637T}}</ref> <ref name=e21090869/>). It can be zero, positive, or negative <ref>{{cite journal | last1 = Hu| first1 = K.T. | year = 1962 | title = On the Amount of Information | journal = Theory Probab. Appl. | volume = 7 | issue = 4 | pages = 439–447 | doi = 10.1137/1107041 }}</ref>. The positivity corresponds to relations generalizing the pairwise correlations, nullity corresponds to a refined notion of independence, and negativity detects high dimensional "emergent" relations and clusterized datapoints <ref name=s41598/>). | | For 3 variables, Brenner et al. applied multivariate mutual information to neural coding and called its negativity "synergy" <ref>{{cite journal | last1 = Brenner | first1 = N. | last2 = Strong | first2 = S. | last3 = Koberle | first3 = R. | last4 = Bialek | first4 = W. | year = 2000 | title = Synergy in a Neural Code | doi = 10.1162/089976600300015259 | pmid = 10935917 | journal = Neural Comput | volume = 12 | issue = 7 | pages = 1531–1552 }}</ref> and Watkinson et al. applied it to genetic expression <ref>{{cite journal | last1 = Watkinson | first1 = J. | last2 = Liang | first2 = K. | last3 = Wang | first3 = X. | last4 = Zheng | first4 = T.| last5 = Anastassiou | first5 = D. | year = 2009 | title = Inference of Regulatory Gene Interactions from Expression Data Using Three-Way Mutual Information | doi = 10.1111/j.1749-6632.2008.03757.x | pmid = 19348651 | journal = Chall. Syst. Biol. Ann. N. Y. Acad. Sci. | volume = 1158 | issue = 1 | pages = 302–313 | bibcode = 2009NYASA1158..302W | url = https://semanticscholar.org/paper/cb09223a34b08e6dcbf696385d9ab76fd9f37aa4 }}</ref>. For arbitrary k variables, Tapia et al. applied multivariate mutual information to gene expression <ref name=s41598>{{cite journal|last1=Tapia|first1=M.|last2=Baudot|first2=P.|last3=Formizano-Treziny|first3=C.|last4=Dufour|first4=M.|last5=Goaillard|first5=J.M.|year=2018|title=Neurotransmitter identity and electrophysiological phenotype are genetically coupled in midbrain dopaminergic neurons|doi= 10.1038/s41598-018-31765-z|pmid=30206240|pmc=6134142|journal=Sci. Rep.|volume=8|issue=1|pages=13637|bibcode=2018NatSR...813637T}}</ref> <ref name=e21090869/>). It can be zero, positive, or negative <ref>{{cite journal | last1 = Hu| first1 = K.T. | year = 1962 | title = On the Amount of Information | journal = Theory Probab. Appl. | volume = 7 | issue = 4 | pages = 439–447 | doi = 10.1137/1107041 }}</ref>. The positivity corresponds to relations generalizing the pairwise correlations, nullity corresponds to a refined notion of independence, and negativity detects high dimensional "emergent" relations and clusterized datapoints <ref name=s41598/>). |
第1,645行: |
第1,585行: |
| | | |
| | | |
− | === Directed information === | + | === 定向信息 Directed information === |
− | | |
− | === Directed information ===
| |
− | | |
− | 定向信息
| |
| | | |
| [[Directed information]], <math>\operatorname{I}\left(X^n \to Y^n\right)</math>, measures the amount of information that flows from the process <math>X^n</math> to <math>Y^n</math>, where <math>X^n</math> denotes the vector <math>X_1, X_2, ..., X_n</math> and <math>Y^n</math> denotes <math>Y_1, Y_2, ..., Y_n</math>. The term ''directed information'' was coined by [[James Massey]] and is defined as | | [[Directed information]], <math>\operatorname{I}\left(X^n \to Y^n\right)</math>, measures the amount of information that flows from the process <math>X^n</math> to <math>Y^n</math>, where <math>X^n</math> denotes the vector <math>X_1, X_2, ..., X_n</math> and <math>Y^n</math> denotes <math>Y_1, Y_2, ..., Y_n</math>. The term ''directed information'' was coined by [[James Massey]] and is defined as |
第1,683行: |
第1,619行: |
| | | |
| | | |
− | === Normalized variants === | + | === 标准化变形 Normalized variants === |
− | | |
− | === Normalized variants ===
| |
− | | |
− | 标准化变体
| |
| | | |
| Normalized variants of the mutual information are provided by the ''coefficients of constraint'',{{sfn|Coombs|Dawes|Tversky|1970}} [[uncertainty coefficient]]<ref name=pressflannery>{{Cite book|last1=Press|first1=WH|last2=Teukolsky |first2=SA|last3=Vetterling|first3=WT|last4=Flannery|first4=BP|year=2007|title=Numerical Recipes: The Art of Scientific Computing|edition=3rd|publisher=Cambridge University Press|location=New York|isbn=978-0-521-88068-8|chapter=Section 14.7.3. Conditional Entropy and Mutual Information|chapter-url=http://apps.nrbook.com/empanel/index.html#pg=758}}</ref> or proficiency:<ref name=JimWhite>{{Cite conference | | Normalized variants of the mutual information are provided by the ''coefficients of constraint'',{{sfn|Coombs|Dawes|Tversky|1970}} [[uncertainty coefficient]]<ref name=pressflannery>{{Cite book|last1=Press|first1=WH|last2=Teukolsky |first2=SA|last3=Vetterling|first3=WT|last4=Flannery|first4=BP|year=2007|title=Numerical Recipes: The Art of Scientific Computing|edition=3rd|publisher=Cambridge University Press|location=New York|isbn=978-0-521-88068-8|chapter=Section 14.7.3. Conditional Entropy and Mutual Information|chapter-url=http://apps.nrbook.com/empanel/index.html#pg=758}}</ref> or proficiency:<ref name=JimWhite>{{Cite conference |
第2,085行: |
第2,017行: |
| | | |
| | | |
− | === Adjusted mutual information === | + | === 调整后的互信息 Adjusted mutual information === |
− | | |
− | === Adjusted mutual information ===
| |
− | | |
− | 调整后的相互信息
| |
| | | |
| {{Main|adjusted mutual information}} | | {{Main|adjusted mutual information}} |
− |
| |
− |
| |
| | | |
| | | |
第2,157行: |
第2,083行: |
| | | |
| | | |
− | === Linear correlation === | + | === 线性相关 Linear correlation === |
− | | |
− | === Linear correlation ===
| |
− | | |
− | 线性相关
| |
− | | |
| | | |
| | | |
第2,343行: |
第2,264行: |
| | | |
| | | |
− | == Applications == | + | == 申请 Applications == |
− | | |
− | == Applications ==
| |
− | | |
− | 申请
| |
| | | |
| In many applications, one wants to maximize mutual information (thus increasing dependencies), which is often equivalent to minimizing [[conditional entropy]]. Examples include: | | In many applications, one wants to maximize mutual information (thus increasing dependencies), which is often equivalent to minimizing [[conditional entropy]]. Examples include: |
第2,501行: |
第2,418行: |
| | | |
| | | |
− | == See also == | + | == 参见 See also == |
− | | |
− | == See also ==
| |
− | | |
− | 参见
| |
| | | |
| * [[Pointwise mutual information]] | | * [[Pointwise mutual information]] |
第2,519行: |
第2,432行: |
| | | |
| | | |
− | == Notes == | + | == 注释 Notes == |
− | | |
− | == Notes ==
| |
− | | |
− | 注释
| |
| | | |
| <references /> | | <references /> |
第2,533行: |
第2,442行: |
| | | |
| | | |
− | == References == | + | == 参考资料 References == |
− | | |
− | == References ==
| |
− | | |
− | 参考资料
| |
| | | |
| * {{cite journal|last1=Baudot|first1=P.|last2=Tapia|first2=M.|last3=Bennequin|first3=D.|last4=Goaillard|first4=J.M.|title=Topological Information Data Analysis|journal=Entropy|volume=21|issue=9|at=869|year=2019|doi= 10.3390/e21090869|bibcode=2019Entrp..21..869B|arxiv=1907.04242}} | | * {{cite journal|last1=Baudot|first1=P.|last2=Tapia|first2=M.|last3=Bennequin|first3=D.|last4=Goaillard|first4=J.M.|title=Topological Information Data Analysis|journal=Entropy|volume=21|issue=9|at=869|year=2019|doi= 10.3390/e21090869|bibcode=2019Entrp..21..869B|arxiv=1907.04242}} |