“信息论 Information theory”的版本间的差异

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索
第1行: 第1行:
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。{{distinguish|Information science}}
+
{{distinguish|Information science}}
  
  
第187行: 第187行:
 
==Quantities of information==
 
==Quantities of information==
  
资料数量
+
信息的度量
  
 
{{Main|Quantities of information}}
 
{{Main|Quantities of information}}
第199行: 第199行:
 
Information theory is based on [[probability theory]] and statistics.  Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables.  The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy [[Communication channel|channel]] in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
 
Information theory is based on [[probability theory]] and statistics.  Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables.  The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy [[Communication channel|channel]] in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
  
Information theory is based on probability theory and statistics.  Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables.  The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
+
Information theory is based on probability theory and statistics.  Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables.  The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed.  
 +
The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
  
信息理论是基于概率论和统计学的。信息论常常涉及到与随机变量相关的分布信息的度量。重要的信息量是熵和互信息,前者是单一随机变量中信息的度量,后者是两个随机变量之间信息的度量。前者是一个随机变量的概率分布的性质,并且给出了一个限制,在给定分布的独立样本生成的数据可以被可靠压缩的速率。后者是两个随机变量联合分布的一个性质,是当信道统计量由联合分布确定时,在长块长度的限制下通过噪声信道的可靠通信的最大速率。
+
信息论基于概率论和统计学,其中经常涉及与随机变量相关的分布信息的度量。信息论中重要的信息量有:熵(两个随机变量中信息的度量)和互信息(两个随机变量之间共有的信息的度量)。熵是随机变量的概率分布的一个属性,它限制了由具有给定分布的独立样本生成的数据能够进行可靠压缩的速率。互信息是两个随机变量联合分布的一个属性,是当信道统计量由联合分布确定时,在长块长度的限制下通过噪声信道的可靠通信的最大速率。
  
  
第211行: 第212行:
 
The choice of logarithmic base in the following formulae determines the unit of information entropy that is used.  A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.
 
The choice of logarithmic base in the following formulae determines the unit of information entropy that is used.  A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.
  
以下公式中对数底的选择决定了使用的熵单位。一个常见的信息单位是位,基于以2爲底的对数。其他单位包括基于自然对数的 nat 和基于常用对数的十进制数字。
+
在下列公式中,对数底数的选择决定了信息熵的单位。信息的常见单位是比特(基于二进制对数)。其他单位包括nat(自然对数)和十进制数字(常用对数)。
 
 
  
  
第219行: 第219行:
 
In what follows, an expression of the form {{math|''p'' log ''p''}} is considered by convention to be equal to zero whenever {{math|1=''p'' = 0}}.  This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
 
In what follows, an expression of the form {{math|''p'' log ''p''}} is considered by convention to be equal to zero whenever {{math|1=''p'' = 0}}.  This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
  
In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
+
In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
  
在接下来的部分中,按照惯例,只要存在一个表达式,该表达式就被认为等于零。这是合理的,因为 math lim { p right tarrow 0 + } p log p0 / math 适用于任何对数底。
+
下文中,按惯例将形式的表达式视为等于零。这是合理的,因为<math>\lim_{p \rightarrow 0+} p \log p = 0</math>适用于任何对数底。
  
  
第231行: 第231行:
 
===Entropy of an information source===
 
===Entropy of an information source===
  
信息源的熵
+
信源的熵
  
 
Based on the [[probability mass function]] of each source symbol to be communicated, the Shannon [[Entropy (information theory)|entropy]] {{math|''H''}}, in units of bits (per symbol), is given by
 
Based on the [[probability mass function]] of each source symbol to be communicated, the Shannon [[Entropy (information theory)|entropy]] {{math|''H''}}, in units of bits (per symbol), is given by
  
 
Based on the probability mass function of each source symbol to be communicated, the Shannon entropy , in units of bits (per symbol), is given by
 
Based on the probability mass function of each source symbol to be communicated, the Shannon entropy , in units of bits (per symbol), is given by
 +
<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
  
基于要通信的每个源符号的概率质量函数,香农熵以比特为单位(每个符号)由
+
基于每个要通信的源符号的概率质量函数,香农熵(以比特为单位)由下式给出:
 
 
:<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
 
 
 
 
<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
 
<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
  
数学 h- sum { i } i  log 2(pi) / math
 
  
 
where {{math|''p<sub>i</sub>''}} is the probability of occurrence of the {{math|''i''}}-th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the [[Shannon (unit)|shannon]] in his honor. Entropy is also commonly computed using the natural logarithm (base [[E (mathematical constant)|{{mvar|e}}]], where {{mvar|e}} is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base {{nowrap|1=2<sup>8</sup> = 256}} will produce a measurement in [[byte]]s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
 
where {{math|''p<sub>i</sub>''}} is the probability of occurrence of the {{math|''i''}}-th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the [[Shannon (unit)|shannon]] in his honor. Entropy is also commonly computed using the natural logarithm (base [[E (mathematical constant)|{{mvar|e}}]], where {{mvar|e}} is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base {{nowrap|1=2<sup>8</sup> = 256}} will produce a measurement in [[byte]]s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
第249行: 第246行:
 
where  is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base E (mathematical constant)|, where  is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base  will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
 
where  is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base E (mathematical constant)|, where  is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base  will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
  
其中是源符号的第-个可能值出现的概率。这个方程给出了以“比特”(每个符号)为单位的熵,因为它使用了以2为底的对数,而这个以2为底的熵度量有时被称为香农(shannon) ,以纪念他。熵的计算也通常使用自然对数(基数 e (数学常数) | ,其中是欧拉数) ,它产生以每个符号的 nats 为单位的熵的测量,有时通过避免在公式中包含额外常数来简化分析。其他基地也是可能的,但不常用。例如,以为底的对数将产生以每个符号的字节为单位的测量值,以10为底的对数将产生以每个符号的十进制数字(或哈特利)为单位的测量值。
+
其中{{math|''p<sub>i</sub>''}}是源符号的第{{math|''i''}}个可能值出现的概率。该方程式以比特(每个符号)为单位给出熵,因为它使用以2为底的对数,所以这个熵有时被称为香农熵以表纪念。熵的计算也通常使用自然对数(以[[E (mathematical constant)|{{mvar|e}}]]为底数,其中{{mvar|e}}是欧拉数,其他底数也是可行的,但不常用),这样就可以测量每个符号的熵值,有时在公式中可以通过避免额外的常量来简化分析。例如以{{nowrap|1=2<sup>8</sup> = 256}}为底的对数,每个符号将产生以的字节为单位的测量值。以10为底的对数,每个符号将产生以十进制数字(或哈特利)为单位的测量值。
 
 
 
 
 
 
  
  
第259行: 第253行:
 
Intuitively, the entropy  of a discrete random variable  is a measure of the amount of uncertainty associated with the value of  when only its distribution is known.
 
Intuitively, the entropy  of a discrete random variable  is a measure of the amount of uncertainty associated with the value of  when only its distribution is known.
  
直观上,离散型随机变量的熵是一种度量,当只有它的分布是已知的时候,与它的值相关的不确定性的量度。
+
直观的来看,离散型随机变量{{math|''X''}}的熵{{math|''H<sub>X</sub>''}}是不确定性度量,当只知道其分布时,它的值与{{math|''X''}}的值相关。
  
  
第269行: 第263行:
 
The entropy of a source that emits a sequence of  symbols that are independent and identically distributed (iid) is  bits (per message of  symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length  will be less than .
 
The entropy of a source that emits a sequence of  symbols that are independent and identically distributed (iid) is  bits (per message of  symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length  will be less than .
  
发出独立且同分布的符号序列(iid)的源的熵是位(每个符号消息)。如果源数据符号是同分布的,但不是独立的,则消息长度的熵将小于。
+
发出{{math|''N''}}个[[独立同分布]] (iid)的符号序列的源,其熵为{{math|''N'' ⋅ ''H''}}位(每个信息{{math|''N''}}符号)。如果源数据符号是同分布但不独立的,则长度为{{math|''N''}}的消息的熵将小于{{math|''N'' ⋅ ''H''}}。
  
  
第279行: 第273行:
 
The entropy of a [[Bernoulli trial as a function of success probability, often called the , .  The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.]]
 
The entropy of a [[Bernoulli trial as a function of success probability, often called the , .  The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.]]
  
作为成功概率的函数,[ Bernoulli 试验的熵,通常称为,。当两种可能的结果的概率相等时,每次试验的熵最大化为1位,就像无偏的掷硬币一样。]
+
[[File:Binary entropy plot.svg|thumbnail|right|200px|以[[Bernoulli trial]]的熵作为成功概率的函数,通常称作{{em|[[binary entropy function]]}}, {{math|''H''<sub>b</sub>(''p'')}}。当两个可能的结果发生概率相等时(例如投掷无偏硬币),每次试验的熵最大为1比特。]]
  
  
第285行: 第279行:
  
  
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted.  If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted.  Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages {{math|{{mset|''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub>}}}} that {{math|''X''}} could be, and {{math|''p''(''x'')}} is the probability of some <math>x \in \mathbb X</math>, then the entropy, {{math|''H''}}, of {{math|''X''}} is defined:<ref name = Reza>{{cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22}}</ref>
+
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted.  If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted.  Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages {{math|{{mset|''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub>}}}} that {{math|''X''}} could be, and {{math|''p''(''x'')}} is the probability of some <math>x \in \mathbb X</math>, then the entropy, {{math|''H''}}, of is defined:<ref name = Reza>{{cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22}}</ref>
  
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted.  If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted.  Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages }} that  could be, and  is the probability of some <math>x \in \mathbb X</math>, then the entropy, , of  is defined:
+
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted.  If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted.  Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages }} that  could be, and  is the probability of some <math>x \in \mathbb X</math>, then the entropy, , of  is defined:  
 +
 
 +
如果一个人发送了1000比特(0s和1s),并且接收者在发送之前就已知这些比特中的每一个的值(具有确定性的特定值),显然就不会发送任何信息。但是,如果每个比特独立且等可能的为0或1时,则已经发送了1000香农信息(通常称为:比特)。在这两个极端之间,信息可以按以下方式进行量化。如果𝕏是{{math|''X''}}可能在的所有消息的集合{{math|{{mset|''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub>}}}},且{{math|''p''(''x'')}}是<math>x \in \mathbb X</math>的概率,那么熵、{{math|''H''}}和{{math|''H''}}的定义如下: <ref name = Reza>{{cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22}}</ref>
  
如果一个人在传输之前传输1000位(0和1) ,并且接收者知道每个位的值(有一个确定的特定值) ,那么很明显没有信息被传输。然而,如果每个比特都是独立的,那么它们传输的信息量可能是0或1,1000个信息子(通常被称为比特)。在这两个极端之间,信息可以被量化如下。如果 x 是所有消息}}的集合,并且是某个 mathbb x / math 中某个数学 x 的概率,那么熵,,的定义如下:
 
  
  
第299行: 第294行:
 
<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
 
<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
  
Math h (x) mathbb { x }[ i (x)]-sum { x in mathbb { x } p (x) log p (x) . / math
+
:<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
  
  
第305行: 第300行:
  
  
(Here, {{math|''I''(''x'')}} is the [[self-information]], which is the entropy contribution of an individual message, and {{math|𝔼<sub>''X''</sub>}} is the [[expected value]].) A property of entropy is that it is maximized when all the messages in the message space are equiprobable {{math|1=''p''(''x'') = 1/''n''}}; i.e., most unpredictable, in which case {{math|1=''H''(''X'') = log ''n''}}.
+
(Here, is the [[self-information]], which is the entropy contribution of an individual message, and {{math|''I''(''x'')}}{{math|𝔼<sub>''X''</sub>}} is the [[expected value]].) A property of entropy is that it is maximized when all the messages in the message space are equiprobable {{math|1=''p''(''x'') = 1/''n''}}; i.e., most unpredictable, in which case {{math|1=''H''(''X'') = log ''n''}}.
  
 
(Here,  is the self-information, which is the entropy contribution of an individual message, and  is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .
 
(Here,  is the self-information, which is the entropy contribution of an individual message, and  is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .
  
(这里是自信息,它是单个信息的熵贡献,也是期望值。)熵的一个特性是,当消息空间中的所有消息都是等概率时,熵就会最大化; 也就是说,在这种情况下,熵是最不可预测的。
+
(其中:{{math|''I''(''x'')}}是[[自信息]],表示单个信息的熵贡献;{{math|''I''(''x'')}}{{math|𝔼<sub>''X''</sub>}}为{{math|''X''}}的期望。)熵的一个特性是,当消息空间中的所有消息都是等概率{{math|1=''p''(''x'') = 1/''n''}}时熵最大; 也就是说,在{{math|1=''H''(''X'') = log ''n''}}这种情况下,熵是最不可预测的。
  
  
第319行: 第314行:
 
The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:
 
The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:
  
对于具有两个结果的随机变量,熵的特殊情况是二元熵函数,通常以对数为底2,因此以 shannon (Sh)为单位:
+
对于具有两个结果的随机变量的信息熵,其特殊情况为二进制熵函数(通常用以为底2对数,因此以香农(Sh)为单位):
  
  
第329行: 第324行:
 
<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>
 
<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>
  
(p)-p log 2 p-(1-p) log 2(1-p) . / math
+
:<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>
  
  
第345行: 第340行:
 
The  of two discrete random variables  and  is merely the entropy of their pairing: .  This implies that if  and  are independent, then their joint entropy is the sum of their individual entropies.
 
The  of two discrete random variables  and  is merely the entropy of their pairing: .  This implies that if  and  are independent, then their joint entropy is the sum of their individual entropies.
  
两个离散随机变量的熵,仅仅是它们配对的熵: 。这意味着如果和是独立的,那么它们的联合熵就是它们各自熵的总和。
+
两个离散的随机变量{{math|''X''}}和{{math|''Y''}}的联合熵大致是它们的组合熵: {{math|(''X'', ''Y'')}}。若{{math|''X''}}和{{math|''Y''}}是独立的,那么它们的联合熵就是其各自熵的总和。
  
  
第355行: 第350行:
 
For example, if  represents the position of a chess piece —  the row and  the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
 
For example, if  represents the position of a chess piece —  the row and  the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
  
例如,如果代表棋子的位置ー行和列,那么棋子行和列的联合熵就是棋子位置的联合熵。
+
例如:如果{{math|(''X'', ''Y'')}}代表棋子的位置({{math|''X''}} 表示行和{{math|''Y''}}表示列),那么棋子所在位置的熵就是棋子行、列的联合熵。
  
  
第365行: 第360行:
 
<math>H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \,</math>
 
<math>H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \,</math>
  
数学 h (x,y) mathbb { x,y }[- log p (x,y)]- sum { x,y } p (x,y) log p (x,y) / math
+
:<math>H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \,</math>
  
  
第375行: 第370行:
 
Despite similar notation, joint entropy should not be confused with .
 
Despite similar notation, joint entropy should not be confused with .
  
尽管符号相似,联合熵不应与。
+
尽管符号相似,注意联合熵与交叉熵不能混淆。
  
  
第385行: 第380行:
 
===Conditional entropy (equivocation)===
 
===Conditional entropy (equivocation)===
  
条件熵(含糊其辞)
+
条件熵(含糊度)
  
 
The {{em|[[conditional entropy]]}} or ''conditional uncertainty'' of {{math|''X''}} given random variable {{math|''Y''}} (also called the ''equivocation'' of {{math|''X''}} about {{math|''Y''}}) is the average conditional entropy over {{math|''Y''}}:<ref name=Ash>{{cite book | title = Information Theory | author = Robert B. Ash | publisher = Dover Publications, Inc. | origyear = 1965| year = 1990 | isbn = 0-486-66521-6 | url = https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty}}</ref>
 
The {{em|[[conditional entropy]]}} or ''conditional uncertainty'' of {{math|''X''}} given random variable {{math|''Y''}} (also called the ''equivocation'' of {{math|''X''}} about {{math|''Y''}}) is the average conditional entropy over {{math|''Y''}}:<ref name=Ash>{{cite book | title = Information Theory | author = Robert B. Ash | publisher = Dover Publications, Inc. | origyear = 1965| year = 1990 | isbn = 0-486-66521-6 | url = https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty}}</ref>
  
 
The  or conditional uncertainty of  given random variable  (also called the equivocation of  about ) is the average conditional entropy over :
 
The  or conditional uncertainty of  given random variable  (also called the equivocation of  about ) is the average conditional entropy over :
 +
在给定随机变量{{math|''Y''}}下{{math|''X''}}的条件熵(或条件不确定性,也可称为{{math|''X''}}关于{{math|''Y''}}的含糊度))是{{math|''Y''}}上的条件平均熵: <ref name=Ash>{{cite book | title = Information Theory | author = Robert B. Ash | publisher = Dover Publications, Inc. | origyear = 1965| year = 1990 | isbn = 0-486-66521-6 | url = https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty}}</ref>
  
给定随机变量的或条件不确定性(也称为约的模糊性)是除以以下条件熵的平均值:
 
  
  
第401行: 第396行:
 
<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
 
<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
  
数学 h (x | y) mathbb e y [ h (x | y)]- sum { y } p (y) sum { x } p (x | y) log p (x | y)- sum { x,y } p (x,y) log p (x | y)。 数学
+
:<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
  
  
第411行: 第406行:
 
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use.  A basic property of this form of conditional entropy is that:
 
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use.  A basic property of this form of conditional entropy is that:
  
因为熵可以取决于一个随机变量或者一个确定的随机变量,所以应该注意不要混淆条件熵的这两个定义,前者更为常用。这种条件熵的一个基本属性是:
+
由于熵能够以随机变量或该随机变量的某个值为条件,所以应注意不要混淆条件熵的这两个定义(前者更为常用)。该类条件熵的一个基本属性为:
  
  
第421行: 第416行:
 
  <math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
 
  <math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
  
数学 h (x | y) h (x,y)-h (y)。 ,/ 数学
+
: <math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
  
  
第431行: 第426行:
 
===Mutual information (transinformation)===
 
===Mutual information (transinformation)===
  
互信息(交换信息)
+
互信息(转移信息)
  
 
''[[Mutual information]]'' measures the amount of information that can be obtained about one random variable by observing another.  It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.  The mutual information of {{math|''X''}} relative to {{math|''Y''}} is given by:
 
''[[Mutual information]]'' measures the amount of information that can be obtained about one random variable by observing another.  It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.  The mutual information of {{math|''X''}} relative to {{math|''Y''}} is given by:
第437行: 第432行:
 
Mutual information measures the amount of information that can be obtained about one random variable by observing another.  It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.  The mutual information of  relative to  is given by:
 
Mutual information measures the amount of information that can be obtained about one random variable by observing another.  It is important in communication where it can be used to maximize the amount of information shared between sent and received signals.  The mutual information of  relative to  is given by:
  
互信息测量的是通过观察另一个随机变量可以获得的信息量。在通信中,它可以用来最大限度地在发送和接收信号之间共享信息量,这一点非常重要。相对于的相互信息是通过以下方式给出的:
+
互信息度量的是通过观察另一个随机变量可以获得的信息量。在通信中可以用它来最大化发送和接收信号之间共享的信息量,这一点至关重要。{{math|''X''}}相对于{{math|''X''}}的互信息由以下公式给出:
  
  
第447行: 第442行:
 
<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
 
<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
  
数学 i (x; y) mathbb { x,y }[ SI (x,y)] sum { x,y } p (x,y) log frac { p (x,y)} ,p (y)} / math
+
:<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
  
 
where {{math|SI}} (''S''pecific mutual ''I''nformation) is the [[pointwise mutual information]].
 
where {{math|SI}} (''S''pecific mutual ''I''nformation) is the [[pointwise mutual information]].
第453行: 第448行:
 
where  (Specific mutual Information) is the pointwise mutual information.
 
where  (Specific mutual Information) is the pointwise mutual information.
  
其中(特定的相互信息)是点间互信息。
+
其中{{math|SI}} (Specific mutual Information,特定互信息)是点间的互信息。
  
  
第463行: 第458行:
 
A basic property of the mutual information is that
 
A basic property of the mutual information is that
  
互信息的一个基本属性是
+
互信息的一个基本属性为:
  
 
: <math>I(X;Y) = H(X) - H(X|Y).\,</math>
 
: <math>I(X;Y) = H(X) - H(X|Y).\,</math>
第469行: 第464行:
 
  <math>I(X;Y) = H(X) - H(X|Y).\,</math>
 
  <math>I(X;Y) = H(X) - H(X|Y).\,</math>
  
数学 i (x; y) h (x)-h (x | y)。 ,/ 数学
+
: <math>I(X;Y) = H(X) - H(X|Y).\,</math>
  
 
That is, knowing ''Y'', we can save an average of {{math|''I''(''X''; ''Y'')}} bits in encoding ''X'' compared to not knowing ''Y''.
 
That is, knowing ''Y'', we can save an average of {{math|''I''(''X''; ''Y'')}} bits in encoding ''X'' compared to not knowing ''Y''.
第475行: 第470行:
 
That is, knowing Y, we can save an average of  bits in encoding X compared to not knowing Y.
 
That is, knowing Y, we can save an average of  bits in encoding X compared to not knowing Y.
  
也就是说,了解了 y,我们就可以在编码 x 时比不知道 y 平均节省位。
+
也就是说,在编码的过程中知道''Y''比不知道''Y''平均节省{{math|''I''(''X''; ''Y'')}}比特。
  
  
第491行: 第486行:
 
  <math>I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\,</math>
 
  <math>I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\,</math>
  
数学 i (x; y) i (y; x) h (x) + h (y)-h (x,y)。 ,/ math
+
: <math>I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\,</math>
  
  
第501行: 第496行:
 
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:
 
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:
  
在给定 y 值和 x 的先验分布的情况下,互信息可以表示为 x 的后验概率之间的平均 Kullback-Leibler 散度(信息增益) :
+
互信息可以表示为在给定''Y''值和''X''的后验分布的情况下,''X''的后验概率之间的平均 Kullback-Leibler 散度(信息增益) :
  
 
: <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
 
: <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
第507行: 第502行:
 
  <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
 
  <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
  
Math i (x; y) mathbb e { p (y)}[ d { mathrm { KL }(p (x | y) | p (x))] . / math
+
: <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
  
 
In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''.  This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
 
In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''.  This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
第513行: 第508行:
 
In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y.  This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
 
In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y.  This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
  
换句话说,这是一种衡量方法,如果我们给出 y 的值,平均来说,x 上的概率分布将会改变多少。这经常被重新计算为从边际分布乘积到实际联合分布的背离:
+
换句话说这是一种度量方法:若我们给出''Y''的值,得出''X''上的概率分布将会平均变化多少。这通常用于计算边际分布的乘积与实际联合分布的差异:
  
 
: <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
 
: <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
第519行: 第514行:
 
  <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
 
  <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
  
Math i (x; y) d { mathrum { KL }(p (x,y) | p (x) p (y)) . / math
+
: <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
  
  
第529行: 第524行:
 
Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ<sup>2</sup> test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.
 
Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ<sup>2</sup> test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.
  
互信息与列联表和多项分布上的对数似然比检验以及 Pearson 的 sup 2 / sup 检验密切相关: 互信息可以被看作是一个评估一对变量之间独立性的统计量,并且具有良好的指定的渐近分布。
+
互信息与列联表中的似然比检验和多项分布以及皮尔森卡方检验密切相关: 互信息可以视为评估一对变量之间独立性的统计量,并且具有明确指定的渐近分布。
  
  
第539行: 第534行:
 
===Kullback–Leibler divergence (information gain)===
 
===Kullback–Leibler divergence (information gain)===
  
Kullback-Leibler 分歧(信息增益)
+
Kullback-Leibler 散度(信息增益)
  
 
The ''[[Kullback–Leibler divergence]]'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true" [[probability distribution]] ''p(X)'', and an arbitrary probability distribution ''q(X)''. If we compress data in a manner that assumes ''q(X)'' is the distribution underlying some data, when, in reality, ''p(X)'' is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.  It is thus defined
 
The ''[[Kullback–Leibler divergence]]'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true" [[probability distribution]] ''p(X)'', and an arbitrary probability distribution ''q(X)''. If we compress data in a manner that assumes ''q(X)'' is the distribution underlying some data, when, in reality, ''p(X)'' is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.  It is thus defined
第545行: 第540行:
 
The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.  It is thus defined
 
The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression.  It is thus defined
  
Kullback-Leibler 分歧(或相对熵,信息增益,或相对熵)是一种比较两种分布的方法: 一种是“真实的”概率分布 p (x) ,另一种是任意的概率分布 q (x)。如果我们假设 q (x)是某些数据的分布,而实际上 p (x)是正确的分布,那么 Kullback-Leibler 散度就是每个数据压缩所需的平均附加比特数。它就是这样定义的
+
Kullback-Leibler 散度(或信息散度、相对熵、信息增益)是比较两种分布的方法: “真实的”概率分布''p(X)''和任意概率分布''q(X)''。若假设''q(X)''是基于某种方式压缩的数据的分布,而实际上''p(X)''才是真正分布,那么 Kullback-Leibler 散度是每个数据压缩所需的平均额外比特数。因此定义:
  
  
第555行: 第550行:
 
<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
 
<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
  
数学 d { mathrm { KL }(p (x) | q (x)) sum { x }-p (x) log { q (x)} ,-,sum { x }-p (x) log { p (x) log { sum { p (x)}{ q (x)}。 数学
+
:<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
  
  
第565行: 第560行:
 
Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).
 
Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).
  
尽管有时它被用作一个距离度量,KL 散度不是一个真正的度量,因为它不是对称的,也不满足三角不等式(使它成为一个半准度量)。
+
尽管有时会将KL散度用作距离量度但它并不是一个真正的指标,因为它是不对称的,同时也不满足三角不等式(KL散度为一个半准度量)。
  
  
第575行: 第570行:
 
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x).  If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X.  The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2.  In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.
 
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x).  If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X.  The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2.  In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.
  
对 KL 分歧的另一种解释是“不必要的惊讶” ,由真理之前引入: 假设一个数字 x 是从一个离散的集合随机抽取的概率分布 p (x)。如果爱丽丝知道真正的分布 p (x) ,而鲍勃认为(有先验)分布是 q (x) ,那么平均而言,鲍勃在看到 x 的值时会比爱丽丝更惊讶。Kl 发散度是 Bob (主观)惊喜的(客观)预期值减去 Alice 的惊喜值,如果日志位于以2为基数,则以位为单位进行测量。通过这种方式,鲍勃的先验“错误”的程度可以量化为预期会让他“不必要地惊讶”的程度。
+
KL散度的另一种解释是先验者从事实中引入的“不必要的惊喜”。假设将从概率分布为“ p(x)”的离散集合中随机抽取数字“ X”,如果Alice知道真实的分布“p(x)”,而Bob认为“q(x)”具有先验概率分布,那么Bob将比Alice更多次看到“X”的值,Bob也将具有更多的信息内容。KL散度就是Bob意外惊喜的期望值减去Alice意外惊喜的期望值(如果对数以2为底,则以比特为单位),这样Bob的先验是“错误的”程度可以根据他变得“不必要的惊讶”的期望来进行量化。
 
 
 
 
  
  
第585行: 第578行:
 
===Other quantities===
 
===Other quantities===
  
其他数量
+
其他度量
  
 
Other important information theoretic quantities include [[Rényi entropy]] (a generalization of entropy), [[differential entropy]] (a generalization of quantities of information to continuous distributions), and the [[conditional mutual information]].
 
Other important information theoretic quantities include [[Rényi entropy]] (a generalization of entropy), [[differential entropy]] (a generalization of quantities of information to continuous distributions), and the [[conditional mutual information]].
第591行: 第584行:
 
Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.
 
Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.
  
其他重要的信息理论量包括 r nyi 熵(熵的推广) ,微分熵(信息量的推广到连续分布) ,以及条件互信息。
+
信息论中其他重要的量包括Rényi熵(一种熵的推广),微分熵(信息量推广到连续分布),以及条件互信息。
  
  
第615行: 第608行:
 
A picture showing scratches on the readable surface of a CD-R.  Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using [[error detection and correction.]]
 
A picture showing scratches on the readable surface of a CD-R.  Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using [[error detection and correction.]]
  
在可读光盘的表面上显示划痕的图片。 音乐和数据光盘使用纠错编码进行编码,因此即使它们有轻微的划痕,也可以通过[纠错]进行读取
+
[[File:CDSCRATCHES.jpg|thumb|right|在可读CD的表面上显示划痕的图片。音乐和数据CD使用纠错编码进行编码,因此即使它们有轻微的划痕,也可以通过错误检测和纠正来对CD进行读取。]]
  
  
第625行: 第618行:
 
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
 
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
  
编码理论是信息论最重要和最直接的应用之一。它可以细分为信源编码理论和信道编码理论。利用数据的统计描述,信息理论量化描述数据所需的位数,这是数据源的熵。
+
编码理论是信息论最重要、最直接的应用之一,可以细分为信源编码理论和信道编码理论。信息理论使用统计数据描述来量化描述数据所需的比特数,这是源的信息熵。
 
 
  
  
第632行: 第624行:
  
 
* Data compression (source coding): There are two formulations for the compression problem:
 
* Data compression (source coding): There are two formulations for the compression problem:
 
+
*数据压缩(源编码):压缩问题有两个相关公式;
  
  
 
*[[lossless data compression]]: the data must be reconstructed exactly;
 
*[[lossless data compression]]: the data must be reconstructed exactly;
 
+
* [[[无损数据压缩]]:数据必须准确重构;
  
  
 
*[[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ''[[rate–distortion theory]]''.
 
*[[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ''[[rate–distortion theory]]''.
 
+
* [[有损数据压缩]]:由失真函数测得的在指定保真度级别内分配重构数据所需的比特数。信息理论中的这个子集称为率失真理论。
  
  
 
* Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.
 
* Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.
 
+
*纠错码(信道编码):数据压缩会尽可能多的消除冗余,而纠错码会添加所需的冗余(即纠错),以便在嘈杂的信道上有效且保真地传输数据。
  
  
第655行: 第647行:
 
This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.
 
This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.
  
这种将编码理论划分为压缩和传输的做法得到了信息传输定理或信源信道分离定理的证明,这些定理证明在许多情况下使用比特作为信息的通用货币是正确的。然而,这些定理只适用于一个发送用户希望与一个接收用户通信的情况。在多于一个发射器(多重接达频道)、多于一个接收器(广播频道)或中介”辅助器”(中继频道)或多个一般网络的情况下,先压缩后传输可能不再是最佳选择。网络信息理论就是指这些多 agent 通信模型。
+
信息传输定理或源-信道分离定理证明编码理论划分为压缩和传输是正确的,这些定理证明了在许多情况下使用比特作为信息的‘’通用货币‘’是合理的,但这只在发送用户与特定接收用户建立通信的情况下才成立。在具有多个发送器(多路访问信道),多个接收器(广播信道)或中转器(中继信道)或多个计算机网络的情况下,压缩后再进行传输可能就不再是最佳选择。[[网络信息论]]是指这些多主体通信模型。
 
 
 
 
  
  
第665行: 第655行:
 
===Source theory===
 
===Source theory===
  
来源理论
+
源理论
  
 
Any process that generates successive messages can be considered a {{em|[[Communication source|source]]}} of information.  A memoryless source is one in which each message is an [[Independent identically distributed random variables|independent identically distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose less restrictive constraints.  All such sources are [[stochastic process|stochastic]].  These terms are well studied in their own right outside information theory.
 
Any process that generates successive messages can be considered a {{em|[[Communication source|source]]}} of information.  A memoryless source is one in which each message is an [[Independent identically distributed random variables|independent identically distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose less restrictive constraints.  All such sources are [[stochastic process|stochastic]].  These terms are well studied in their own right outside information theory.
  
Any process that generates successive messages can be considered a of information.  A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints.  All such sources are stochastic.  These terms are well studied in their own right outside information theory.
+
Any process that generates successive messages can be considered a of information.  A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints.  All such sources are stochastic.  These terms are well studied in their own right outside information theory.
 
 
任何生成连续消息的进程都可以被认为是一种信息。无记忆信源是指每个信息是一个独立的同分布随机变量,而遍历性和平稳性的性质对信源的限制较少。所有这些源都是随机的。这些术语本身在信息论之外已经得到了很好的研究。
 
 
 
  
 +
生成连续消息的任何过程都可以视为信息的通讯来源。无记忆信源是指每个消息都是独立同分布的随机变量,而遍历理论和平稳过程的性质对信源施加的限制较少。所有这些源都满足随机过程。在信息论领域外,这些术语已经有很全面的相关研究。
  
  
第681行: 第669行:
 
====Rate====<!-- This section is linked from Channel capacity -->
 
====Rate====<!-- This section is linked from Channel capacity -->
  
速率! ——这部分与频道容量相关——
+
====速率====
 
 
 
Information ''[[Entropy rate|rate]]'' is the average entropy per symbol.  For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
 
Information ''[[Entropy rate|rate]]'' is the average entropy per symbol.  For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
  
 
Information rate is the average entropy per symbol.  For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
 
Information rate is the average entropy per symbol.  For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
  
信息率是每个符号的平均熵。对于无记忆的源,这仅仅是每个符号的熵,而对于静止的随机过程来说,它是
+
信息的熵率是每个符号的平均熵。对于无记忆信源,信息的熵率仅表示每个符号的熵,而在平稳随机过程中,它是:
  
  
第697行: 第684行:
 
<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
 
<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
  
(x n | x { n-1} ,x { n-2} ,x { n-3} ,ldots) ; / math
+
:<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
  
  
第707行: 第694行:
 
that is, the conditional entropy of a symbol given all the previous symbols generated.  For the more general case of a process that is not necessarily stationary, the average rate is
 
that is, the conditional entropy of a symbol given all the previous symbols generated.  For the more general case of a process that is not necessarily stationary, the average rate is
  
也就是说,一个符号的条件熵给出了所有之前产生的符号。对于一个不一定是平稳的过程的更一般的情况,平均速率是
+
也就是说,一个符号的条件熵给出了所有之前生成的符号。对于不一定平稳的过程的更一般情况,平均速率为:
  
  
第717行: 第704行:
 
<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
 
<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
  
(x1,x2, dots xn) ; / math
+
:<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
  
  
第727行: 第714行:
 
that is, the limit of the joint entropy per symbol.  For stationary sources, these two expressions give the same result.
 
that is, the limit of the joint entropy per symbol.  For stationary sources, these two expressions give the same result.
  
也就是说,每个符号的联合熵的极限。对于静止源,这两个表达式给出了相同的结果。
+
也就是每个符号的联合熵的极限。对于固定源,这两个表达式得出的结果相同。<ref>{{cite book | title = Digital Compression for Multimedia: Principles and Standards | author = Jerry D. Gibson | publisher = Morgan Kaufmann | year = 1998 | url = https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1 | isbn = 1-55860-369-7 }}</ref>
 +
 
  
  
第737行: 第725行:
 
It is common in information theory to speak of the "rate" or "entropy" of a language.  This is appropriate, for example, when the source of information is English prose.  The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .
 
It is common in information theory to speak of the "rate" or "entropy" of a language.  This is appropriate, for example, when the source of information is English prose.  The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .
  
在信息论中,谈论一种语言的“速率”或“熵”是很常见的。这是适当的,例如,当信息来源是英语散文。信息来源的速度与它的冗余度以及它可以被压缩的程度有关。
+
在信息论中谈论一种语言的“速率”或“熵”是很常见的,比如当信源是英文散文时就很合适。信息源的速率与其冗余度以及可被压缩程度有关。
  
  
第761行: 第749行:
 
Communications over a channel—such as an ethernet cable—is the primary motivation of information theory.  However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.
 
Communications over a channel—such as an ethernet cable—is the primary motivation of information theory.  However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.
  
通过信道(如以太网电缆)进行通信是信息理论的主要动机。然而,这样的通道往往不能产生准确的重建信号; 噪声,静默时间,和其他形式的信号损坏往往降低质量。
+
通过信道(例如:英特网电缆)进行通信是信息论的主要动机。然而,这样的信道往往不能产生信号的精确重建; 在静默时段内的噪声和其他形式的信号损坏往往会使得信息质量的降低。
  
  
第771行: 第759行:
 
Consider the communications process over a discrete channel. A simple model of the process is shown below:
 
Consider the communications process over a discrete channel. A simple model of the process is shown below:
  
考虑一个离散通道上的通信过程。这个过程的一个简单模型如下:
+
考虑离散信道上的通信过程。该过程的简单模型如下:
  
  
第781行: 第769行:
 
Channel model
 
Channel model
  
通道模型
+
[[File:Channel model.svg|center|800px|信道模型]
  
  
第791行: 第779行:
 
Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let x)}} be the conditional probability distribution function of Y given X. We will consider x)}} to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the  and is given by:
 
Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let x)}} be the conditional probability distribution function of Y given X. We will consider x)}} to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the  and is given by:
  
这里 x 表示传输的消息空间,y 表示单位时间内通过我们的信道接收的消息空间。设 x)}是给定 x 的 y 的条件概率分布函数。我们将考虑 x)}是我们通信信道的固有固定属性(表示我们信道噪声的性质)。那么,x y 的联合分布完全取决于我们的渠道,以及我们选择通过渠道发送的信息边缘分布。在这些约束条件下,我们希望最大化信息速率,或者说信号速率,我们可以通过信道进行通信。对于这个问题的适当措施是互信息,这个最大的互信息被称为互信息,并由以下人员给出:
+
这里''X''表示单位时间内通过信道发送的信息空间,''Y''表示单位时间内通过信道接收的信息空间。设{{math|''p''(''y''{{pipe}}''x'')}}是给定''X''''Y''的条件概率分布函数。将{{math|''p''(''y''{{pipe}}''x'')}}视为通信信道的固定属性(表示信道噪声的性质)。那么''X''''Y''的联合分布完全取决于所选用的信道和{{math|''f''(''x'')}},以及通过信道发送的信息的边缘分布。在这些约束条件下,我们希望最大化信息速率或信号速率,可以通过信道进行通信。对此的适当度量为互信息,信道容量即为最大互信息,且由下式给出:
 +
 
  
 
:<math> C = \max_{f} I(X;Y).\! </math>
 
:<math> C = \max_{f} I(X;Y).\! </math>
第797行: 第786行:
 
<math> C = \max_{f} I(X;Y).\! </math>
 
<math> C = \max_{f} I(X;Y).\! </math>
  
数学 c  max { f } i (x; y) . !数学
+
:<math> C = \max_{f} I(X;Y).\! </math>
  
 
This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol).  For any information rate ''R < C'' and coding error ε > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.  In addition, for any rate ''R &gt; C'', it is impossible to transmit with arbitrarily small block error.
 
This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol).  For any information rate ''R < C'' and coding error ε > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.  In addition, for any rate ''R &gt; C'', it is impossible to transmit with arbitrarily small block error.
第803行: 第792行:
 
This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol).  For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.  In addition, for any rate R &gt; C, it is impossible to transmit with arbitrarily small block error.
 
This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol).  For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error.  In addition, for any rate R &gt; C, it is impossible to transmit with arbitrarily small block error.
  
此容量具有以下与以信息速率 r (其中 r 通常为每个符号位)进行通信有关的属性。对于任意信息速率 rc 和编码错误0,对于足够大的 n,存在一个长度 n 和速率≥ r 的码和一个译码算法,使得块错误的最大概率≤ ,即总是可以在任意小的块错误下传输。此外,对于任何速率的 rc,它不可能传输任意小的块误差。
+
信道容量具有以下与以信息速率“R”进行通信有关的属性(其中“R”通常为每个符号的比特数)。对于任意信息速率''R < C''和编码错误ε > 0,对于足够大的''N'',存在长度为''N''和速率大于等于R的代码以及解码算法使得块错误的最大概率小于等于ε;即总是可以在任意小的块错误下进行传输。此外对于任何速率的“ R> C”,不可能以很小的块错误进行发送。
 
 
  
  
第813行: 第801行:
 
Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
 
Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
  
信道编码涉及到寻找这种接近最优的编码,它可以用于在噪声信道上以接近信道容量的速率传输数据,而且编码错误很小。
+
信道编码就寻找一种接近最优的编码,它可以用于在噪声信道上以接近信道容量的速率传输数据,且编码错误很小。
  
  
第823行: 第811行:
 
====Capacity of particular channel models====
 
====Capacity of particular channel models====
  
特定渠道模型的容量
+
特定信道容量模型
  
 
* A continuous-time analog communications channel subject to [[Gaussian noise]] — see [[Shannon–Hartley theorem]].
 
* A continuous-time analog communications channel subject to [[Gaussian noise]] — see [[Shannon–Hartley theorem]].
  
 
+
*连续时间内受高斯噪声(Gaussian noise)限制的模拟通信信道(详细内容请参见[[Shannon–Hartley定理]])。
  
 
* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of {{math|1 &minus; ''H''<sub>b</sub>(''p'')}} bits per channel use, where {{math|''H''<sub>b</sub>}} is the binary entropy function to the base-2 logarithm:
 
* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of {{math|1 &minus; ''H''<sub>b</sub>(''p'')}} bits per channel use, where {{math|''H''<sub>b</sub>}} is the binary entropy function to the base-2 logarithm:
  
 
+
*二进制对称通道([[binary symmetric channel]],BSC)是交叉概率为''p''的二进制输入、二进制输出(以概率''p''翻转输入位)通道。每个通道使用的BSC容量为{{math|1 &minus; ''H''<sub>b</sub>(''p'')}}比特,其中{{math|''H''<sub>b</sub>}}是以2为底的对数的二进制熵函数:
  
  
第841行: 第829行:
 
File:Binary symmetric channel.svg
 
File:Binary symmetric channel.svg
  
文件: Binary symmetric channel.svg
+
::[[File:Binary symmetric channel.svg]]
  
  
第848行: 第836行:
  
 
* A [[binary erasure channel]] (BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is {{nowrap|1 &minus; ''p''}} bits per channel use.
 
* A [[binary erasure channel]] (BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is {{nowrap|1 &minus; ''p''}} bits per channel use.
 
+
*二进制擦除通道([[binary erasure channel]],BEC)是擦除概率为“ p”的二进制输入、三进制输出通道。可能的通道输出为0、1和擦除符号'e'。擦除表示信息输入位的完全丢失。每个通道使用的BEC容量为{{nowrap|1 &minus; ''p''}}比特。
  
  
第859行: 第847行:
 
File:Binary erasure channel.svg
 
File:Binary erasure channel.svg
  
文件: Binary erasure channel.svg
+
::[[File:Binary erasure channel.svg]]
  
  
第879行: 第867行:
 
===Intelligence uses and secrecy applications===
 
===Intelligence uses and secrecy applications===
  
情报使用和保密应用
+
情报使用和安保中的应用
  
 
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the [[Ban (unit)|ban]], was used in the [[Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of World War II in Europe]].  Shannon himself defined an important concept now called the [[unicity distance]]. Based on the redundancy of the [[plaintext]], it attempts to give a minimum amount of [[ciphertext]] necessary to ensure unique decipherability.
 
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the [[Ban (unit)|ban]], was used in the [[Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of World War II in Europe]].  Shannon himself defined an important concept now called the [[unicity distance]]. Based on the redundancy of the [[plaintext]], it attempts to give a minimum amount of [[ciphertext]] necessary to ensure unique decipherability.
第885行: 第873行:
 
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe.  Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.
 
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe.  Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.
  
信息论概念应用于密码学和密码分析。图灵的信息单元,即禁令,被用于 Ultra 项目,破解了德国的恩尼格玛密码,加速了二战在欧洲的结束。香农自己定义了一个重要的概念,现在称为单一性距离。基于明文的冗余性,它试图给出最小数量的密文,以确保独特的破译能力。
+
信息论概念应用于密码学和密码分析。在[[Ultra]]的项目中使用了Turing的信息单元[[Ban(unit)| ban]],破解了德国的恩尼格玛密码,加速了二战在欧洲的结束。香农定义了一个重要的概念,现在称为单一性距离([[unicity distance]]),基于明文的冗余性尝试给出具有唯一可解密性所需的最少量的密文。
 
 
  
  
第895行: 第882行:
 
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear.  A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers.  The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.
 
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear.  A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers.  The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.
  
信息理论使我们相信,保守秘密比它最初可能出现时要困难得多。穷举法可以基于非对称密钥算法或最常用的对称密钥算法(有时称为秘密密钥算法) ,如分组密码破坏系统。目前所有这些方法的安全性都来自于这样一个假设,即任何已知的攻击都不能在实际的时间内破坏它们。
+
信息论使我们觉得保守秘密比最初看起来要困难得多。穷举法可以基于非对称密钥算法或最常用的对称密钥算法(也称为秘密密钥算法),如分组密码破坏系统。当前,所有这些方法的安全性都来自一下假设:在已知的时间内没有已知的攻击可以破坏它们。
 
 
  
  
第905行: 第891行:
 
Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks.  In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications.  In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.
 
Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks.  In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications.  In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.
  
信息理论安全指的是诸如一次性垫子这样的方法,它不容易受到这种蛮力攻击。在这种情况下,明文和密文(以密钥为条件)之间的正向有条件的互信息可以确保适当的传输,而明文和密文之间的无条件互信息仍然为零,从而导致绝对安全的通信。换句话说,窃听者不能通过获得密文的知识而不是密钥来改进他或她对明文的猜测。但是,如同任何其他密码系统一样,必须小心谨慎,甚至正确应用信息——理论上安全的方法; Venona 项目能够破解苏联的一次性密封垫,因为它们不适当地重复使用关键材料。
+
信息理论安全性指的是诸如一次性密钥之类的不易受到这种暴力攻击的方法。在这种情况下,可以确保明文和密文(以密钥为条件)之间的正条件互信息正确的传输,而明文和密文之间的无条件互信息仍为零,从而保证绝对安全的通信。换句话说,窃听者将无法通过获取密文而不是密钥的知识来改善其对纯文本的猜测。但是,就像在其他任何密码系统中一样,必须小心正确的使用信息论中安全的方法; 之所以Venona 项目能够破解苏联的一次性密钥,是因为苏联不当地重复使用关键材料。
 
 
  
  
第915行: 第900行:
 
===Pseudorandom number generation===
 
===Pseudorandom number generation===
  
伪随机数生成
+
伪随机数的生成
  
 
[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed [[cryptographically secure pseudorandom number generator]]s, but even they require [[random seed]]s external to the software to work as intended. These can be obtained via [[Extractor (mathematics)|extractors]], if done carefully. The measure of  sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems.  Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
 
[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed [[cryptographically secure pseudorandom number generator]]s, but even they require [[random seed]]s external to the software to work as intended. These can be obtained via [[Extractor (mathematics)|extractors]], if done carefully. The measure of  sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems.  Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
第921行: 第906行:
 
Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of  sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems.  Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
 
Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of  sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems.  Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
  
伪随机数发生器广泛应用于计算机语言库和应用程序中。它们几乎普遍不适合加密使用,因为它们不回避现代计算机设备和软件的确定性本质。一类改进的随机数生成器被称为密码安全的伪随机数生成器,但即使它们也需要软件外部的随机种子才能正常工作。这些可以通过提取器获得,如果仔细做。提取器中充分随机性的度量是最小熵,这是一个通过 r nyi 熵与 Shannon 熵相关的值,r nyi 熵也用于评估密码系统中的随机性。虽然相关,这些措施之间的区别意味着具有高香农熵的随机变量不一定令人满意的使用在提取器和密码学使用。
+
伪随机数生成器在计算机语言库和应用程序中广泛应用。由于它们没有规避现代计算机设备和软件的确定性,因此普遍不适合密码使用。一类改进的随机数生成器称为加密安全的伪随机数生成器,但也需要软件外部的随机种子才能正常工作。如果更进一步开发可以通过提取器来获得。提取器中充分随机性的度量是最小熵,该值与通过[[Rényi熵]]的Shannon熵有关;Rényi熵还用于评估密码系统中的随机性。虽然相关,具有较高Shannon熵的随机变量不一定适合在提取器中使用,因此在密码学用途中并不令人满意。
 
 
 
 
  
  
第937行: 第920行:
 
One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.
 
One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.
  
信息论的一个早期商业应用是在地震石油勘探领域。这一领域的工作使从所需的地震信号中剔除和分离不需要的噪声成为可能。信息理论和数字信号处理提供了一个重大改善的分辨率和图像清晰度比以前的模拟方法。
+
信息论的一个早期商业应用是在地震石油勘探领域。在该领域的应用可以从期望的地震信号中剔除和分离不需要的噪声。与以前的模拟方法相比,信息论和数字信号处理大大提高了图像的分辨率和清晰度。<ref>{{cite journal|doi=10.1002/smj.4250020202 | volume=2 | issue=2 | title=The corporation and innovation | year=1981 | journal=Strategic Management Journal | pages=97–118 | last1 = Haggerty | first1 = Patrick E.}}</ref>
 
 
  
  
第953行: 第935行:
 
Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."
 
Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."
  
符号学家 Doede Nauta 和 Winfried n 都认为查尔斯·桑德斯·皮尔士在他的符号学著作中创造了信息理论。诺塔将符号信息理论定义为研究“编码、过滤和信息处理的内部过程”
+
符号学家[[:nl:Doede Nauta|Doede Nauta]][[Winfried Nöth]]都认为[[Charles Sanders Peirce]]在他的符号学著作中创造了信息论。<ref name="Nauta 1972">{{cite book |ref=harv |last1=Nauta |first1=Doede |title=The Meaning of Information |date=1972 |publisher=Mouton |location=The Hague |isbn=9789027919960}}</ref>{{rp|171}}<ref name="Nöth 2012">{{cite journal |ref=harv |last1=Nöth |first1=Winfried |title=Charles S. Peirce's theory of information: a theory of the growth of symbols and of knowledge |journal=Cybernetics and Human Knowing |date=January 2012 |volume=19 |issue=1–2 |pages=137–161 |url=https://edisciplinas.usp.br/mod/resource/view.php?id=2311849}}</ref>{{rp|137}} Nauta将符号信息论定义为研究编码、过滤和信息处理的内部过程。<ref name="Nauta 1972"/>{{rp|91}}
 +
 
  
  
第965行: 第948行:
 
来自信息论的概念,如冗余和代码控制,已经被符号学家如 Umberto Eco 和 Ferruccio Rossi-Landi 用来解释意识形态作为一种信息传递的形式,主导社会阶层通过使用高度冗余的标志来发出信息,这样在一系列相互竞争的标志中只有一个信息被解码。
 
来自信息论的概念,如冗余和代码控制,已经被符号学家如 Umberto Eco 和 Ferruccio Rossi-Landi 用来解释意识形态作为一种信息传递的形式,主导社会阶层通过使用高度冗余的标志来发出信息,这样在一系列相互竞争的标志中只有一个信息被解码。
  
 
+
信息论的概念(例如冗余和代码控制)已被符号学家如Umberto Eco和Ferruccio Rossi-Landi用来解释意识形态,将其作为消息传输的一种形式,占统治地位的社会阶层通过使用具有高度冗余性的标志来发出其信息,从而在一系列相互竞争的标志中只有一个信息被解码。
  
  
第973行: 第956行:
 
===Miscellaneous applications===
 
===Miscellaneous applications===
  
杂项申请
+
其他杂项中的应用
 
 
 
Information theory also has applications in [[Gambling and information theory]], [[black hole information paradox|black holes]], and [[bioinformatics]].
 
Information theory also has applications in [[Gambling and information theory]], [[black hole information paradox|black holes]], and [[bioinformatics]].
  
 
Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.
 
Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.
  
信息理论在赌博和信息理论、黑洞和生物信息学中也有应用。
+
信息论在赌博和信息论、黑洞信息悖论和生物信息学中也有应用。
  
  
第989行: 第971行:
 
==See also==
 
==See also==
  
参见
+
另请参阅
  
 
{{Portal|Mathematics}}
 
{{Portal|Mathematics}}
第1,541行: 第1,523行:
 
===MOOC on information theory===
 
===MOOC on information theory===
  
信息理论大型开放式课程
+
信息论大型开放式课程
  
 
* Raymond W. Yeung, "[http://www.inc.cuhk.edu.hk/InformationTheory/index.html Information Theory]" ([[The Chinese University of Hong Kong]])
 
* Raymond W. Yeung, "[http://www.inc.cuhk.edu.hk/InformationTheory/index.html Information Theory]" ([[The Chinese University of Hong Kong]])

2020年9月1日 (二) 15:27的版本

模板:Distinguish


模板:Information theory


Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper titled "A Mathematical Theory of Communication". Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper titled "A Mathematical Theory of Communication". Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

信息论研究的是信息的量化、存储与传播。信息论最初是由Claude Shannon在1948年的一篇题为"A Mathematical Theory of Communication"的论文中提出的,其目的是找到信号处理和通信操作(如数据压缩)的基本限制。信息论对于旅行者号深空探测任务的成功、光盘的发明、移动电话的可行性、互联网的发展、语言学和人类感知的研究、对黑洞的理解以及许多其他领域的研究都是至关重要的。



The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, information engineering, and electrical engineering. The theory has also found applications in other areas, including statistical inference, natural language processing, cryptography, neurobiology,[1] human vision,[2] the evolution[3] and function[4] of molecular codes (bioinformatics), model selection in statistics,[5] thermal physics,[6] quantum computing, linguistics, plagiarism detection,[7] pattern recognition, and anomaly detection.引用错误:没有找到与</ref>对应的<ref>标签 Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, Grey system theory and measures of information.

</ref> Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, Grey system theory and measures of information.

信息论的重要分支包括信源编码、算法复杂性理论、算法信息论、资讯理论安全性、灰色系统理论和信息度量。



Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s and JPEGs), and channel coding (e.g. for DSL). Information theory is used in information retrieval, intelligence gathering, gambling, and even in musical composition.

Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s and JPEGs), and channel coding (e.g. for DSL). Information theory is used in information retrieval, intelligence gathering, gambling, and even in musical composition.

信息论基本主题的应用包括无损数据压缩(例如:ZIP压缩文件)、有损数据压缩(例如:Mp3和jpeg格式) ,以及频道编码(例如:DSL)。信息论应用于信息检索、情报收集、赌博,甚至在音乐创作中也有应用。



A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a 模板:Dice (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy.

A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy.

信息论中的一个关键度量是熵。熵量化了一个随机变量的值或者一个随机过程的结果所包含的不确定性。例如,识别一次公平抛硬币的结果(有两个同样可能的结果)所提供的信息(较低的熵)少于指定一卷 a 的结果(有六个同样可能的结果)。信息论中的其他一些重要指标有:互信息、信道容量、误差指数和相对熵。



Overview

Overview

概览



Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.[1]

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.

信息论主要研究信息的传递、处理、提取和利用。抽象地说,信息可以作为不确定性的解决方案。1948年,Claude Shannon在他的论文"A Mathematical Theory of Communication"中将这个抽象的概念具体化,在这篇论文中“信息”被认为是一组可能的信息,其目标是通过噪声信道发送这些信息,然后让接收器在信道噪声的影响下以较低的错误概率来重构信息。Shannon的主要结果为:噪信道编码定理表明,在许多信道(这个数量仅仅依赖于信息发送所经过的信道的统计信息)使用的限制下,信道容量为渐近可达到的信息传输速率,。



Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory.

Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory.

信息论与一系列纯粹的、应用的学科密切相关,在过去半个世纪甚至更久的时间里,在全球范围内已经有各种专栏下被研究和简化为工程实践,比如在自适应系统,预期系统,人工智能,复杂系统,复杂性科学,控制论,信息学,机器学习,以及许多描述的系统科学等学科中的研究与应用。信息论是一个广泛而深入的数学理论,同样也具有广泛而深入的应用,其中编码理论是至关重要的领域。



Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.

Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.

编码理论与寻找明确的方法(编码)有关,用于提高效率和将噪声信道上传输的数据错误率降低到接近信道容量。这些编码可大致分为数据压缩编码(信源编码)和纠错(信道编码)技术。在后一种技术中,花了很多年才证明Shannon的工作是可行的。



A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.

A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.

第三类信息论代码是密码算法(包括代码和密码)。编码理论和信息论的概念、方法和结果在密码学和密码分析中得到了广泛的应用。有关历史应用,请参阅文章禁令(单位)。



Historical background

Historical background

历史背景




The landmark event that established the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

The landmark event that established the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

1948年7月和10月,Claude Shannon在Bell System Technical Journal上发表了经典论文:"A Mathematical Theory of Communication",这是建立信息论学科并立即引起全世界关注的里程碑事件。



Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation W = K log m (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as H = log Sn = n log S, where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which has since sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.

Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as , where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which has since sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.

在此之前,贝尔实验室已经提出了有限的信息论思想,所有这些理论都隐含地假设了概率均等的事件。Harry Nyquist 在1924年发表的论文”Certain Factors Affecting Telegraph Speed”中包含一个理论部分,量化了“智能”和通信系统可以传输的“线路速度”,并给出了关系式(检索Boltzmann常数) ,其中 w 是智能传输的速度,m 是每个时间步长可以选择的不同电压电平的数,k 是常数。Ralph Hartley在1928年发表的论文” Transmission of Information”中,将单词信息作为一个可测量的量,以此反映接收器区分一系列符号的能力,从而将信息量化,其中 s 是可能符号的数量,n 是传输中符号的数量。因此信息的单位就是十进制数字,为了表示对他的尊敬,这个单位有时被称为Hartley,作为信息的单位、尺度或度量。1940年,Alan Turing在对德国二战时期破解迷密码(Enigma ciphers)的统计分析中使用了类似的思想。

Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.

Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.

信息论背后的许多数学理论(包括不同概率的事件)都是由Ludwig Boltzmann和 j. Willard Gibbs 为热力学领域而发展起来的。信息论中的熵和热力学中的熵之间的联系,包括 Rolf Landauer 在20世纪60年代的重要贡献,在热力学和信息论的熵中进行了探讨。



In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that

In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that

1944年底之前,Shannon的工作在贝尔实验室已基本完成。在Shannon的开创性的论文中首次引入了定性和定量的通信模型,将其作为信息理论基础的统计过程。

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

通信的基本问题是在一点上精确地或近似地再现在另一点上选择的信息



With it came the ideas of

With it came the ideas of

相关观点


  • the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;



  • the bit—a new way of seeing the most fundamental unit of information.






Quantities of information

Quantities of information

信息的度量




Information theory is based on probability theory and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.

Information theory is based on probability theory and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.

信息论基于概率论和统计学,其中经常涉及与随机变量相关的分布信息的度量。信息论中重要的信息量有:熵(两个随机变量中信息的度量)和互信息(两个随机变量之间共有的信息的度量)。熵是随机变量的概率分布的一个属性,它限制了由具有给定分布的独立样本生成的数据能够进行可靠压缩的速率。互信息是两个随机变量联合分布的一个属性,是当信道统计量由联合分布确定时,在长块长度的限制下通过噪声信道的可靠通信的最大速率。



The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.

The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.

在下列公式中,对数底数的选择决定了信息熵的单位。信息的常见单位是比特(基于二进制对数)。其他单位包括nat(自然对数)和十进制数字(常用对数)。



In what follows, an expression of the form p log p is considered by convention to be equal to zero whenever p = 0. This is justified because [math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math] for any logarithmic base.

In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because [math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math] for any logarithmic base.

下文中,按惯例将形式的表达式视为等于零。这是合理的,因为[math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math]适用于任何对数底。



Entropy of an information source

Entropy of an information source

信源的熵

Based on the probability mass function of each source symbol to be communicated, the Shannon entropy H, in units of bits (per symbol), is given by

Based on the probability mass function of each source symbol to be communicated, the Shannon entropy , in units of bits (per symbol), is given by [math]\displaystyle{ H = - \sum_{i} p_i \log_2 (p_i) }[/math]

基于每个要通信的源符号的概率质量函数,香农熵(以比特为单位)由下式给出: [math]\displaystyle{ H = - \sum_{i} p_i \log_2 (p_i) }[/math]


where pi is the probability of occurrence of the i-th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base e, where e is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base 28 = 256 will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.

where is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base E (mathematical constant)|, where is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.

其中pi是源符号的第i个可能值出现的概率。该方程式以比特(每个符号)为单位给出熵,因为它使用以2为底的对数,所以这个熵有时被称为香农熵以表纪念。熵的计算也通常使用自然对数(以e为底数,其中e是欧拉数,其他底数也是可行的,但不常用),这样就可以测量每个符号的熵值,有时在公式中可以通过避免额外的常量来简化分析。例如以28 = 256为底的对数,每个符号将产生以的字节为单位的测量值。以10为底的对数,每个符号将产生以十进制数字(或哈特利)为单位的测量值。


Intuitively, the entropy HX of a discrete random variable X is a measure of the amount of uncertainty associated with the value of X when only its distribution is known.

Intuitively, the entropy of a discrete random variable is a measure of the amount of uncertainty associated with the value of when only its distribution is known.

直观的来看,离散型随机变量X的熵HX是不确定性度量,当只知道其分布时,它的值与X的值相关。



The entropy of a source that emits a sequence of N symbols that are independent and identically distributed (iid) is NH bits (per message of N symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length N will be less than NH.

The entropy of a source that emits a sequence of symbols that are independent and identically distributed (iid) is bits (per message of symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length will be less than .

发出N独立同分布 (iid)的符号序列的源,其熵为NH位(每个信息N符号)。如果源数据符号是同分布但不独立的,则长度为N的消息的熵将小于NH



The entropy of a Bernoulli trial as a function of success probability, often called the 模板:Em, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.

The entropy of a Bernoulli trial as a function of success probability, often called the , . The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.

Bernoulli trial的熵作为成功概率的函数,通常称作模板:Em, Hb(p)。当两个可能的结果发生概率相等时(例如投掷无偏硬币),每次试验的熵最大为1比特。



If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages 模板:Mset that X could be, and p(x) is the probability of some [math]\displaystyle{ x \in \mathbb X }[/math], then the entropy, H, of is defined:[8]

If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages }} that could be, and is the probability of some [math]\displaystyle{ x \in \mathbb X }[/math], then the entropy, , of is defined:

如果一个人发送了1000比特(0s和1s),并且接收者在发送之前就已知这些比特中的每一个的值(具有确定性的特定值),显然就不会发送任何信息。但是,如果每个比特独立且等可能的为0或1时,则已经发送了1000香农信息(通常称为:比特)。在这两个极端之间,信息可以按以下方式进行量化。如果𝕏是X可能在的所有消息的集合模板:Mset,且p(x)[math]\displaystyle{ x \in \mathbb X }[/math]的概率,那么熵、HH的定义如下: [8]




[math]\displaystyle{ H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x). }[/math]

[math]\displaystyle{ H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x). }[/math]

[math]\displaystyle{ H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x). }[/math]



(Here, is the self-information, which is the entropy contribution of an individual message, and I(x)𝔼X is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable p(x) = 1/n; i.e., most unpredictable, in which case H(X) = log n.

(Here, is the self-information, which is the entropy contribution of an individual message, and is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .

(其中:I(x)自信息,表示单个信息的熵贡献;I(x)𝔼XX的期望。)熵的一个特性是,当消息空间中的所有消息都是等概率p(x) = 1/n时熵最大; 也就是说,在H(X) = log n这种情况下,熵是最不可预测的。



The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:

The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:

对于具有两个结果的随机变量的信息熵,其特殊情况为二进制熵函数(通常用以为底2对数,因此以香农(Sh)为单位):



[math]\displaystyle{ H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p). }[/math]

[math]\displaystyle{ H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p). }[/math]

[math]\displaystyle{ H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p). }[/math]



Joint entropy

Joint entropy

联合熵

The 模板:Em of two discrete random variables X and Y is merely the entropy of their pairing: (X, Y). This implies that if X and Y are independent, then their joint entropy is the sum of their individual entropies.

The of two discrete random variables and is merely the entropy of their pairing: . This implies that if and are independent, then their joint entropy is the sum of their individual entropies.

两个离散的随机变量XY的联合熵大致是它们的组合熵: (X, Y)。若XY是独立的,那么它们的联合熵就是其各自熵的总和。



For example, if (X, Y) represents the position of a chess piece — X the row and Y the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

For example, if represents the position of a chess piece — the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

例如:如果(X, Y)代表棋子的位置(X 表示行和Y表示列),那么棋子所在位置的熵就是棋子行、列的联合熵。



[math]\displaystyle{ H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \, }[/math]

[math]\displaystyle{ H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \, }[/math]

[math]\displaystyle{ H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \, }[/math]



Despite similar notation, joint entropy should not be confused with 模板:Em.

Despite similar notation, joint entropy should not be confused with .

尽管符号相似,注意联合熵与交叉熵不能混淆。



Conditional entropy (equivocation)

Conditional entropy (equivocation)

条件熵(含糊度)

The 模板:Em or conditional uncertainty of X given random variable Y (also called the equivocation of X about Y) is the average conditional entropy over Y:[9]

The or conditional uncertainty of given random variable (also called the equivocation of about ) is the average conditional entropy over : 在给定随机变量YX的条件熵(或条件不确定性,也可称为X关于Y的含糊度))是Y上的条件平均熵: [9]




[math]\displaystyle{ H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y). }[/math]

[math]\displaystyle{ H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y). }[/math]

[math]\displaystyle{ H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y). }[/math]



Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

由于熵能够以随机变量或该随机变量的某个值为条件,所以应注意不要混淆条件熵的这两个定义(前者更为常用)。该类条件熵的一个基本属性为:



[math]\displaystyle{ H(X|Y) = H(X,Y) - H(Y) .\, }[/math]
[math]\displaystyle{  H(X|Y) = H(X,Y) - H(Y) .\, }[/math]
[math]\displaystyle{ H(X|Y) = H(X,Y) - H(Y) .\, }[/math]



Mutual information (transinformation)

Mutual information (transinformation)

互信息(转移信息)

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of X relative to Y is given by:

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by:

互信息度量的是通过观察另一个随机变量可以获得的信息量。在通信中可以用它来最大化发送和接收信号之间共享的信息量,这一点至关重要。X相对于X的互信息由以下公式给出:



[math]\displaystyle{ I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)} }[/math]

[math]\displaystyle{ I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)} }[/math]

[math]\displaystyle{ I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)} }[/math]

where SI (Specific mutual Information) is the pointwise mutual information.

where (Specific mutual Information) is the pointwise mutual information.

其中SI (Specific mutual Information,特定互信息)是点间的互信息。



A basic property of the mutual information is that

A basic property of the mutual information is that

互信息的一个基本属性为:

[math]\displaystyle{ I(X;Y) = H(X) - H(X|Y).\, }[/math]
[math]\displaystyle{ I(X;Y) = H(X) - H(X|Y).\, }[/math]
[math]\displaystyle{ I(X;Y) = H(X) - H(X|Y).\, }[/math]

That is, knowing Y, we can save an average of I(X; Y) bits in encoding X compared to not knowing Y.

That is, knowing Y, we can save an average of bits in encoding X compared to not knowing Y.

也就是说,在编码的过程中知道Y比不知道Y平均节省I(X; Y)比特。



Mutual information is symmetric:

Mutual information is symmetric:

互信息是对称的:

[math]\displaystyle{ I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\, }[/math]
[math]\displaystyle{ I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\, }[/math]
[math]\displaystyle{ I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\, }[/math]



Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:

Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:

互信息可以表示为在给定Y值和X的后验分布的情况下,X的后验概率之间的平均 Kullback-Leibler 散度(信息增益) :

[math]\displaystyle{ I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )]. }[/math]
[math]\displaystyle{ I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )]. }[/math]
[math]\displaystyle{ I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )]. }[/math]

In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:

In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:

换句话说这是一种度量方法:若我们给出Y的值,得出X上的概率分布将会平均变化多少。这通常用于计算边际分布的乘积与实际联合分布的差异:

[math]\displaystyle{ I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)). }[/math]
[math]\displaystyle{ I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)). }[/math]
[math]\displaystyle{ I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)). }[/math]



Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ2 test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ2 test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

互信息与列联表中的似然比检验和多项分布以及皮尔森卡方检验密切相关: 互信息可以视为评估一对变量之间独立性的统计量,并且具有明确指定的渐近分布。



Kullback–Leibler divergence (information gain)

Kullback–Leibler divergence (information gain)

Kullback-Leibler 散度(信息增益)

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

Kullback-Leibler 散度(或信息散度、相对熵、信息增益)是比较两种分布的方法: “真实的”概率分布p(X)和任意概率分布q(X)。若假设q(X)是基于某种方式压缩的数据的分布,而实际上p(X)才是真正分布,那么 Kullback-Leibler 散度是每个数据压缩所需的平均额外比特数。因此定义:



[math]\displaystyle{ D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}. }[/math]

[math]\displaystyle{ D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}. }[/math]

[math]\displaystyle{ D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}. }[/math]



Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).

Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).

尽管有时会将KL散度用作距离量度但它并不是一个真正的指标,因为它是不对称的,同时也不满足三角不等式(KL散度为一个半准度量)。



Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.

Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.

KL散度的另一种解释是先验者从事实中引入的“不必要的惊喜”。假设将从概率分布为“ p(x)”的离散集合中随机抽取数字“ X”,如果Alice知道真实的分布“p(x)”,而Bob认为“q(x)”具有先验概率分布,那么Bob将比Alice更多次看到“X”的值,Bob也将具有更多的信息内容。KL散度就是Bob意外惊喜的期望值减去Alice意外惊喜的期望值(如果对数以2为底,则以比特为单位),这样Bob的先验是“错误的”程度可以根据他变得“不必要的惊讶”的期望来进行量化。


Other quantities

Other quantities

其他度量

Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.

Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.

信息论中其他重要的量包括Rényi熵(一种熵的推广),微分熵(信息量推广到连续分布),以及条件互信息。



Coding theory

Coding theory

编码理论




A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using error detection and correction.

A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using error detection and correction.

在可读CD的表面上显示划痕的图片。音乐和数据CD使用纠错编码进行编码,因此即使它们有轻微的划痕,也可以通过错误检测和纠正来对CD进行读取。



Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

编码理论是信息论最重要、最直接的应用之一,可以细分为信源编码理论和信道编码理论。信息理论使用统计数据描述来量化描述数据所需的比特数,这是源的信息熵。



  • Data compression (source coding): There are two formulations for the compression problem:
  • 数据压缩(源编码):压缩问题有两个相关公式;



  • lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called rate–distortion theory.
  • 有损数据压缩:由失真函数测得的在指定保真度级别内分配重构数据所需的比特数。信息理论中的这个子集称为率失真理论。


  • Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.
  • 纠错码(信道编码):数据压缩会尽可能多的消除冗余,而纠错码会添加所需的冗余(即纠错),以便在嘈杂的信道上有效且保真地传输数据。




This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.

This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.

信息传输定理或源-信道分离定理证明编码理论划分为压缩和传输是正确的,这些定理证明了在许多情况下使用比特作为信息的‘’通用货币‘’是合理的,但这只在发送用户与特定接收用户建立通信的情况下才成立。在具有多个发送器(多路访问信道),多个接收器(广播信道)或中转器(中继信道)或多个计算机网络的情况下,压缩后再进行传输可能就不再是最佳选择。网络信息论是指这些多主体通信模型。


Source theory

Source theory

源理论

Any process that generates successive messages can be considered a 模板:Em of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

Any process that generates successive messages can be considered a of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

生成连续消息的任何过程都可以视为信息的通讯来源。无记忆信源是指每个消息都是独立同分布的随机变量,而遍历理论和平稳过程的性质对信源施加的限制较少。所有这些源都满足随机过程。在信息论领域外,这些术语已经有很全面的相关研究。


Rate

Rate

速率

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

信息的熵率是每个符号的平均熵。对于无记忆信源,信息的熵率仅表示每个符号的熵,而在平稳随机过程中,它是:



[math]\displaystyle{ r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots); }[/math]

[math]\displaystyle{ r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots); }[/math]

[math]\displaystyle{ r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots); }[/math]



that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

也就是说,一个符号的条件熵给出了所有之前生成的符号。对于不一定平稳的过程的更一般情况,平均速率为:



[math]\displaystyle{ r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n); }[/math]

[math]\displaystyle{ r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n); }[/math]

[math]\displaystyle{ r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n); }[/math]



that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.[10]

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.

也就是每个符号的联合熵的极限。对于固定源,这两个表达式得出的结果相同。[11]




It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of 模板:Em.

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .

在信息论中谈论一种语言的“速率”或“熵”是很常见的,比如当信源是英文散文时就很合适。信息源的速率与其冗余度以及可被压缩程度有关。



Channel capacity

Channel capacity

信道容量




Communications over a channel—such as an ethernet cable—is the primary motivation of information theory. However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

Communications over a channel—such as an ethernet cable—is the primary motivation of information theory. However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

通过信道(例如:英特网电缆)进行通信是信息论的主要动机。然而,这样的信道往往不能产生信号的精确重建; 在静默时段内的噪声和其他形式的信号损坏往往会使得信息质量的降低。



Consider the communications process over a discrete channel. A simple model of the process is shown below:

Consider the communications process over a discrete channel. A simple model of the process is shown below:

考虑离散信道上的通信过程。该过程的简单模型如下:



Channel model

Channel model

[[File:Channel model.svg|center|800px|信道模型]



Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let p(y|x) be the conditional probability distribution function of Y given X. We will consider p(y|x) to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of f(x), the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the 模板:Em and is given by:

Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let x)}} be the conditional probability distribution function of Y given X. We will consider x)}} to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by:

这里X表示单位时间内通过信道发送的信息空间,Y表示单位时间内通过信道接收的信息空间。设p(y|x)是给定XY的条件概率分布函数。将p(y|x)视为通信信道的固定属性(表示信道噪声的性质)。那么XY的联合分布完全取决于所选用的信道和f(x),以及通过信道发送的信息的边缘分布。在这些约束条件下,我们希望最大化信息速率或信号速率,可以通过信道进行通信。对此的适当度量为互信息,信道容量即为最大互信息,且由下式给出:


[math]\displaystyle{ C = \max_{f} I(X;Y).\! }[/math]

[math]\displaystyle{ C = \max_{f} I(X;Y).\! }[/math]

[math]\displaystyle{ C = \max_{f} I(X;Y).\! }[/math]

This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

信道容量具有以下与以信息速率“R”进行通信有关的属性(其中“R”通常为每个符号的比特数)。对于任意信息速率R < C和编码错误ε > 0,对于足够大的N,存在长度为N和速率大于等于R的代码以及解码算法使得块错误的最大概率小于等于ε;即总是可以在任意小的块错误下进行传输。此外对于任何速率的“ R> C”,不可能以很小的块错误进行发送。



Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

信道编码就寻找一种接近最优的编码,它可以用于在噪声信道上以接近信道容量的速率传输数据,且编码错误很小。



Capacity of particular channel models

Capacity of particular channel models

特定信道容量模型

  • 连续时间内受高斯噪声(Gaussian noise)限制的模拟通信信道(详细内容请参见Shannon–Hartley定理)。
  • A binary symmetric channel (BSC) with crossover probability p is a binary input, binary output channel that flips the input bit with probability p. The BSC has a capacity of 1 − Hb(p) bits per channel use, where Hb is the binary entropy function to the base-2 logarithm:
  • 二进制对称通道(binary symmetric channel,BSC)是交叉概率为p的二进制输入、二进制输出(以概率p翻转输入位)通道。每个通道使用的BSC容量为1 − Hb(p)比特,其中Hb是以2为底的对数的二进制熵函数:



Binary symmetric channel.svg

File:Binary symmetric channel.svg

Binary symmetric channel.svg



  • A binary erasure channel (BEC) with erasure probability p is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is 1 − p bits per channel use.
  • 二进制擦除通道(binary erasure channel,BEC)是擦除概率为“ p”的二进制输入、三进制输出通道。可能的通道输出为0、1和擦除符号'e'。擦除表示信息输入位的完全丢失。每个通道使用的BEC容量为1 − p比特。




Binary erasure channel.svg

File:Binary erasure channel.svg

Binary erasure channel.svg



Applications to other fields

Applications to other fields

其他领域的应用



Intelligence uses and secrecy applications

Intelligence uses and secrecy applications

情报使用和安保中的应用

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.

信息论概念应用于密码学和密码分析。在Ultra的项目中使用了Turing的信息单元 ban,破解了德国的恩尼格玛密码,加速了二战在欧洲的结束。香农定义了一个重要的概念,现在称为单一性距离(unicity distance),基于明文的冗余性尝试给出具有唯一可解密性所需的最少量的密文。



Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.

Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.

信息论使我们觉得保守秘密比最初看起来要困难得多。穷举法可以基于非对称密钥算法或最常用的对称密钥算法(也称为秘密密钥算法),如分组密码破坏系统。当前,所有这些方法的安全性都来自一下假设:在已知的时间内没有已知的攻击可以破坏它们。



Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.

Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.

信息理论安全性指的是诸如一次性密钥之类的不易受到这种暴力攻击的方法。在这种情况下,可以确保明文和密文(以密钥为条件)之间的正条件互信息正确的传输,而明文和密文之间的无条件互信息仍为零,从而保证绝对安全的通信。换句话说,窃听者将无法通过获取密文而不是密钥的知识来改善其对纯文本的猜测。但是,就像在其他任何密码系统中一样,必须小心正确的使用信息论中安全的方法; 之所以Venona 项目能够破解苏联的一次性密钥,是因为苏联不当地重复使用关键材料。



Pseudorandom number generation

Pseudorandom number generation

伪随机数的生成

Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.

Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.

伪随机数生成器在计算机语言库和应用程序中广泛应用。由于它们没有规避现代计算机设备和软件的确定性,因此普遍不适合密码使用。一类改进的随机数生成器称为加密安全的伪随机数生成器,但也需要软件外部的随机种子才能正常工作。如果更进一步开发可以通过提取器来获得。提取器中充分随机性的度量是最小熵,该值与通过Rényi熵的Shannon熵有关;Rényi熵还用于评估密码系统中的随机性。虽然相关,具有较高Shannon熵的随机变量不一定适合在提取器中使用,因此在密码学用途中并不令人满意。


Seismic exploration

Seismic exploration

地震勘探

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.[12]

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.

信息论的一个早期商业应用是在地震石油勘探领域。在该领域的应用可以从期望的地震信号中剔除和分离不需要的噪声。与以前的模拟方法相比,信息论和数字信号处理大大提高了图像的分辨率和清晰度。[13]



Semiotics

Semiotics

符号学

Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics.[14]:171[15]:137 Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."[14]:91

Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."

符号学家Doede NautaWinfried Nöth都认为Charles Sanders Peirce在他的符号学著作中创造了信息论。[14]:171[15]:137 Nauta将符号信息论定义为研究编码、过滤和信息处理的内部过程。[14]:91




Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.[16]

Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.

来自信息论的概念,如冗余和代码控制,已经被符号学家如 Umberto Eco 和 Ferruccio Rossi-Landi 用来解释意识形态作为一种信息传递的形式,主导社会阶层通过使用高度冗余的标志来发出信息,这样在一系列相互竞争的标志中只有一个信息被解码。

信息论的概念(例如冗余和代码控制)已被符号学家如Umberto Eco和Ferruccio Rossi-Landi用来解释意识形态,将其作为消息传输的一种形式,占统治地位的社会阶层通过使用具有高度冗余性的标志来发出其信息,从而在一系列相互竞争的标志中只有一个信息被解码。


Miscellaneous applications

Miscellaneous applications

其他杂项中的应用 Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.

Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.

信息论在赌博和信息论、黑洞信息悖论和生物信息学中也有应用。



See also

See also

另请参阅

模板:Portal





  • Constructor theory - a generalization of information theory that includes quantum information










Applications

Applications

申请




History

History

历史




Theory

Theory

理论




Concepts

Concepts

概念




References

References

参考资料

  1. 1.0 1.1 F. Rieke; D. Warland; R Ruyter van Steveninck; W Bialek (1997). Spikes: Exploring the Neural Code. The MIT press. ISBN 978-0262681087. 
  2. Delgado-Bonal, Alfonso; Martín-Torres, Javier (2016-11-03). "Human vision is determined based on information theory". Scientific Reports (in English). 6 (1): 36038. Bibcode:2016NatSR...636038D. doi:10.1038/srep36038. ISSN 2045-2322. PMC 5093619. PMID 27808236.
  3. cf; Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. (2001). "Bayesian inference of phylogeny and its impact on evolutionary biology". Science. 294 (5550): 2310–2314. Bibcode:2001Sci...294.2310H. doi:10.1126/science.1065889. PMID 11743192.
  4. Allikmets, Rando; Wasserman, Wyeth W.; Hutchinson, Amy; Smallwood, Philip; Nathans, Jeremy; Rogan, Peter K. (1998). "Thomas D. Schneider], Michael Dean (1998) Organization of the ABCR gene: analysis of promoter and splice junction sequences". Gene. 215 (1): 111–122. doi:10.1016/s0378-1119(98)00269-8. PMID 9666097.
  5. Burnham, K. P. and Anderson D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Second Edition (Springer Science, New York) .
  6. Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics". Phys. Rev. 106 (4): 620. Bibcode:1957PhRv..106..620J. doi:10.1103/physrev.106.620.
  7. Bennett, Charles H.; Li, Ming; Ma, Bin (2003). "Chain Letters and Evolutionary Histories". Scientific American. 288 (6): 76–81. Bibcode:2003SciAm.288f..76B. doi:10.1038/scientificamerican0603-76. PMID 12764940. Archived from the original on 2007-10-07. Retrieved 2008-03-11.
  8. 8.0 8.1 Fazlollah M. Reza (1994) [1961]. An Introduction to Information Theory. Dover Publications, Inc., New York. ISBN 0-486-68210-2. https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22. 
  9. 9.0 9.1 Robert B. Ash (1990) [1965]. Information Theory. Dover Publications, Inc.. ISBN 0-486-66521-6. https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty. 
  10. Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1. 
  11. Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1. 
  12. Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.
  13. Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.
  14. 14.0 14.1 14.2 14.3 Nauta, Doede (1972). The Meaning of Information. The Hague: Mouton. ISBN 9789027919960. 
  15. 15.0 15.1 Nöth, Winfried (January 2012). "Charles S. Peirce's theory of information: a theory of the growth of symbols and of knowledge". Cybernetics and Human Knowing. 19 (1–2): 137–161. {{cite journal}}: Invalid |ref=harv (help)
  16. Nöth, Winfried (1981). "Semiotics of ideology". Semiotica, Issue 148.




The classic work

The classic work

经典之作








Other journal articles

Other journal articles

其他期刊文章



  • R. Landauer, IEEE.org, "Information is Physical" Proc. Workshop on Physics and Computation PhysComp'92 (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4.



  • Timme, Nicholas; Alford, Wesley; Flecker, Benjamin; Beggs, John M. (2012). "Multivariate information measures: an experimentalist's perspective". arXiv:1111.6857 [cs.IT].





Textbooks on information theory

Textbooks on information theory

信息论教材





Other books

Other books

其他书籍





MOOC on information theory

MOOC on information theory

信息论大型开放式课程




External links

External links

外部链接

模板:Wikiquote


模板:Library resources box







模板:Cybernetics


模板:Compression methods


模板:Areas of mathematics


模板:Computer science

Category:Computer science

类别: 计算机科学

Category:Cybernetics

类别: 控制论

Category:Formal sciences

类别: 正规科学

Category:Information Age

类别: 信息时代


This page was moved from wikipedia:en:Information theory. Its edit history can be viewed at 信息论/edithistory