信息论 Information theory

 --本词条已由Ricky审校。感谢本词条的初译者高水平的工作，本词审阅时我最舒心的一次。

模板:Distinguish

模板:Information theory

Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper titled "A Mathematical Theory of Communication". Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper titled "A Mathematical Theory of Communication". Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.

信息论 Information Theory研究的是信息的量化、存储与传播。信息论最初是由克劳德·香农 Claude Shannon在1948年的一篇题为《一种通信的数学理论（A Mathematical Theory of Communication）》的里程碑式论文中提出的，其目的是找到信号处理和通信操作（如数据压缩）的基本限制。信息论对于旅行者号深空探测任务的成功、光盘的发明、移动电话的可行性、互联网的发展、语言学和人类感知的研究、对黑洞的理解以及许多其他领域的研究都是至关重要的。

The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, information engineering, and electrical engineering. The theory has also found applications in other areas, including statistical inference, natural language processing, cryptography, neurobiology,^[1] human vision,^[2] the evolution^[3] and function^[4] of molecular codes (bioinformatics), model selection in statistics,^[5] thermal physics,^[6] quantum computing, linguistics, plagiarism detection,^[7] pattern recognition, and anomaly detection.^[8] Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, and information-theoretic security.

The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, information engineering, and electrical engineering. The theory has also found applications in other areas, including statistical inference, natural language processing, cryptography, neurobiology, human vision, the evolution and function of molecular codes (bioinformatics), model selection in statistics, thermal physics, quantum computing, linguistics, plagiarism detection, pattern recognition, and anomaly detection.

该领域是数学、统计学、计算机科学、物理学、神经生物学、信息工程和电气工程的交叉学科。这一理论也在其他领域得到了应用，比如推论统计学、自然语言处理、密码学、神经生物学、人类视觉、分子编码的进化和功能(生物信息学)、统计学中的模型选择、热物理学、量子计算、语言学、剽窃检测、模式识别和异常检测。

Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, Grey system theory and measures of information.

信息论的重要分支包括信源编码、算法复杂性理论、算法信息论、信息理论安全性、灰色系统理论和信息度量。

Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s and JPEGs), and channel coding (e.g. for DSL). Information theory is used in information retrieval, intelligence gathering, gambling, and even in musical composition.

信息论在应用领域的基本课题包括无损数据压缩(例如:ZIP压缩文件)、有损数据压缩(例如:Mp3和jpeg格式) ，以及频道编码(例如:DSL)。信息论在信息检索、情报收集、赌博，甚至在音乐创作中也有应用。

A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a 模板:Dice (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy.

A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a dice (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy.

信息论中的一个关键度量是熵。熵量化了一个随机变量的值或者一个随机过程的结果所包含的不确定性。例如，识别一次公平抛硬币的结果(有两个同样可能的结果)所提供的信息(较低的熵)少于识别抛一次骰子的结果(有六个同样可能的结果)。信息论中的其他一些重要指标有：互信息、信道容量、误差指数和相对熵。

概览

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.^[1]

Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.

信息论主要研究信息的传递、处理、提取和利用。抽象地说，信息可以作为不确定性的解决方案。1948年，克劳德·香农在他的论文《一种通信的数学理论》中将这个抽象的概念具体化，在这篇论文中“信息”被认为是一组可能的信号，这些信号在通过带有噪声的信道发送后，接收者能在信道噪声的影响下以较低的错误概率来重构这些信号。香农的主要结论，有噪信道编码定理，表明在信道使用的许多限制情况下，渐近可达到信息传输速率等于的信道容量，一个仅仅依赖于信息发送所经过的信道本身的统计量。（译注：当信道的信息传输率不超过信道容量时，采用合适的编码方法可以实现任意高的传输可靠性，但若信息传输率超过了信道容量，就不可能实现可靠的传输。）

Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory.

信息论与一系列纯科学和应用科学密切相关。在过去半个世纪甚至更久的时间里，在全球范围内已经有各种各样的学科理论被研究和化归为工程实践，比如在自适应系统，预期系统，人工智能，复杂系统，复杂性科学，控制论，信息学，机器学习，以及系统科学。信息论是一个广博而深遂的数学理论，也具有广泛而深入的应用，其中编码理论是至关重要的领域。

Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.

Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.

编码理论与寻找明确的方法（编码）有关，用于提高效率和将有噪信道上传输的数据错误率降低到接近信道容量。这些编码可大致分为数据压缩编码(信源编码)和纠错(信道编码)技术。对于纠错技术，香农证明了理论极限很多年后才有人找到了真正实现了理论最优的方法。

A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.

A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.

第三类信息论代码是密码算法(包括密文和密码)。编码理论和信息论的概念、方法和结果在密码学和密码分析中得到了广泛的应用。

历史背景

The landmark event that established the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

The landmark event that established the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.

1948年7月和10月，克劳德·E·香农在《贝尔系统技术期刊》上发表了经典论文：《一种通信的数学理论》，这就是建立信息论学科并立即引起全世界关注的里程碑事件。

Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation $W = K log m$ (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as $H = log S n = n log S$ , where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which has since sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.

Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as , where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which has since sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.

在此之前，贝尔实验室已经提出了有限的信息论思想，所有这些理论都隐性地假设了概率均等的事件。Harry Nyquist 在1924年发表的论文《集中影响电报速率的因素（Certain Factors Affecting Telegraph Speed）》中包含一个理论章节，量化了“情报”和通信系统可以传输的“线路速度”，并给出了关系式 $W = K log m$ (参考玻尔兹曼常数) ，其中 W 是情报传输的速度， m 是每个时间步长可以选择的不同电压电平数，K 是常数。Ralph Hartley 在1928年发表的论文《信息的传输（ Transmission of Information）》中，将单词信息作为一个可测量的量，以此反映接收者区分一系列符号的能力，从而将信息量化为 $H = log S n = n log S$ ，其中 S 是可以使用的符号的数量，n 是传输中符号的数量。因此信息的单位就是十进制数字，为了表示对他的尊敬，这个单位有时被称为 Hartley，作为信息的单位、尺度或度量。1940年，图灵在二战时期破解德国的“迷”密码（Enigma ciphers）的统计分析中使用了类似的思想。

Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.

Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.

信息论背后的许多数学理论（包括不同概率的事件）都是由路德维希·玻尔兹曼和约西亚·威拉德·吉布斯为热力学领域开发出来的。

In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that

香农的那篇革命性的、开创性的论文，于1944年的年底便已基本在贝尔实验室完成。在这论文里，香农将通信看作一个统计学过程，首次提出了通信的量化模型，并以此为基础推导出了信息论。论文开篇便提出了一下论断：

"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

“通信的基本问题是在一点上精确地或近似地再现在另一点上选择的信息”

With it came the ideas of

与此相关的一些想法包括：

the information entropy and redundancy of a source, and its relevance through the source coding theorem;

信息熵和信源冗余，以及信源编码定理；

the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;

互信息，有噪信道的信道容量，包括无损通信的证明，和有噪信道编码定理；

the practical result of the [[]] for the channel capacity of a Gaussian channel; as well as

香农-哈特利定律 Shannon–Hartley law应用于高斯信道的信道容量的结果，以及

the bit—a new way of seeing the most fundamental unit of information.

比特 bit——一种新的度量信息的最基本单位

信息的度量

Information theory is based on probability theory and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.

信息论基于概率论和统计学，其中经常涉及衡量随机变量的分布的信息。信息论中重要的信息量有：熵（单个随机变量中信息的度量）和互信息（两个随机变量之间的信息的度量）。熵是随机变量的概率分布的一个属性，它限制了从给定分布中独立采样得到的数据的压缩率。互信息是两个随机变量的联合概率分布的一个属性，是当信道的统计量由联合分布确定时，在长块长度的限制下，通过有噪信道的可靠通信的最大速率。

The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.

在下列公式中，对数底数的选择决定了信息熵的单位。信息的常见单位是比特（基于二进制对数）。其他单位包括 nat（自然对数）和十进制数字（常用对数）。

In what follows, an expression of the form $p log p$ is considered by convention to be equal to zero whenever $p = 0$ . This is justified because [math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math] for any logarithmic base.

In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because [math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math] for any logarithmic base.

下文中，按惯例将 $p = 0$ 时的表达式 $p log p$ 的值视为等于零，因为[math]\displaystyle{ \lim_{p \rightarrow 0+} p \log p = 0 }[/math]适用于任何对数底。

信源的熵

Based on the probability mass function of each source symbol to be communicated, the Shannon entropy $H$ , in units of bits (per symbol), is given by

Based on the probability mass function of each source symbol to be communicated, the Shannon entropy , in units of bits (per symbol), is given by [math]\displaystyle{ H = - \sum_{i} p_i \log_2 (p_i) }[/math]

基于每个用于通信的源符号的概率质量函数，香农熵 Shannon Entropy（以比特为单位）由下式给出： [math]\displaystyle{ H = - \sum_{i} p_i \log_2 (p_i) }[/math]

where $p i$ is the probability of occurrence of the $i$ -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base $e$ , where $e$ is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base 2⁸ = 256 will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.

where is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base E (mathematical constant)|, where is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.

其中 $p i$ 是源符号的第 $i$ 个可能值出现的概率。该方程以比特（每个符号）为单位给出熵，因为它使用以2为底的对数。为表纪念，这个熵有时被称为香农熵。熵的计算也通常使用自然对数(以 $e$ 为底数，其中 $e$ 是欧拉数，其他底数也是可行的，但不常用)，这样就可以测量每个符号的熵值，有时在公式中可以通过避免额外的常量来简化分析。例如以 $28 = 256$ 为底的对数，得出的值就以字节（而非比特）作为单位。以10为底的对数，每个符号将产生以十进制数字(或哈特利)为单位的测量值。

Intuitively, the entropy $H X$ of a discrete random variable $X$ is a measure of the amount of uncertainty associated with the value of $X$ when only its distribution is known.

Intuitively, the entropy of a discrete random variable is a measure of the amount of uncertainty associated with the value of when only its distribution is known.

直观的来看，离散型随机变量 $X$ 的熵 $H X$ 是对不确定性的度量，当只知道其分布时，它的值与 $X$ 的值相关。

The entropy of a source that emits a sequence of $N$ symbols that are independent and identically distributed (iid) is $N \cdot H$ bits (per message of $N$ symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length $N$ will be less than $N \cdot H$ .

The entropy of a source that emits a sequence of symbols that are independent and identically distributed (iid) is bits (per message of symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length will be less than .

当一个信息源发出了一串含有 $N$ 个符号的序列，且每个符号独立同分布时，其熵为 $N \cdot H$ 位(每个信息 $N$ 符号)。如果源数据符号是同分布但不独立的，则长度为 $N$ 的消息的熵将小于 $N \cdot H$ 。

The entropy of a Bernoulli trial as a function of success probability, often called the 模板:Em,

H b (p)

. The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss. 伯努利实验的熵，作为一个成功概率的函数，通常被称为二值熵函数,

H b (p)

。当使用一个无偏的硬币做实验时，两个可能结果出现的概率相等，此时的熵值最大，为1。

If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages $模板:Mset$ that $X$ could be, and $p (x)$ is the probability of some [math]\displaystyle{ x \in \mathbb X }[/math], then the entropy, $H$ , of is defined:^[9]

If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages }} that could be, and is the probability of some [math]\displaystyle{ x \in \mathbb X }[/math], then the entropy, , of is defined:

如果一个人发送了1000比特（0s和1s），然而接收者在发送之前就已知这串比特序列中的每一个位的值，显然这个通信过程并没有任何信息（译注：如果你要告诉我一个我已经知到的消息，那么本次通信没有传递任何信息）。但是，如果消息未知，且每个比特独立且等可能的为0或1时，则本次通信传输了1000香农的信息（通常称为“比特”）。在这两个极端之间，信息可以按以下方式进行量化。如果𝕏是 $X$ 可能在的所有消息的集合 ${x 1, ..., x n$ }，且 $p (x)$ 是[math]\displaystyle{ x \in \mathbb X }[/math]的概率，那么熵、 $H$ 和 $H$ 的定义如下: ^[9]

[math]\displaystyle{ H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x). }[/math]

(Here, is the self-information, which is the entropy contribution of an individual message, and $I (x)$ $𝔼 X$ is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable $p (x) = 1/ n$ ; i.e., most unpredictable, in which case $H (X) = log n$ .

(Here, is the self-information, which is the entropy contribution of an individual message, and is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .

(其中: $I (x)$ 是自信息，表示单个信息的熵贡献; $I (x)$ $𝔼 X$ 为 $X$ 的期望。)熵的一个特性是，当消息空间中的所有消息都是等概率 $p (x) = 1/ n$ 时熵最大; 也就是说，在 $H (X) = log n$ 这种情况下，熵是最不可预测的。

The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:

对于只有两种可能取值的随机变量的信息熵，其特殊情况为二值熵函数（通常用以为底2对数，因此以香农(Sh)为单位）：

[math]\displaystyle{ H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p). }[/math]

联合熵

The 模板:Em of two discrete random variables $X$ and $Y$ is merely the entropy of their pairing: $(X, Y)$ . This implies that if $X$ and $Y$ are independent, then their joint entropy is the sum of their individual entropies.

The of two discrete random variables and is merely the entropy of their pairing: . This implies that if and are independent, then their joint entropy is the sum of their individual entropies.

两个离散的随机变量 $X$ 和 $Y$ 的联合熵 Joint Entropy大致是它们的配对: $(X, Y)$ 。若 $X$ 和 $Y$ 是独立的，那么它们的联合熵就是其各自熵的总和。

For example, if $(X, Y)$ represents the position of a chess piece — $X$ the row and $Y$ the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

For example, if represents the position of a chess piece — the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

例如：如果 $(X, Y)$ 代表棋子的位置（ $X$ 表示行和 $Y$ 表示列），那么棋子所在位置的熵就是棋子行、列的联合熵。

[math]\displaystyle{ H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \, }[/math]

Despite similar notation, joint entropy should not be confused with 模板:Em.

Despite similar notation, joint entropy should not be confused with .

尽管符号相似，注意联合熵与交叉熵不能混淆。

条件熵（含糊度）

The 模板:Em or conditional uncertainty of $X$ given random variable $Y$ (also called the equivocation of $X$ about $Y$ ) is the average conditional entropy over $Y$ :^[10]

The or conditional uncertainty of given random variable (also called the equivocation of about ) is the average conditional entropy over :

在给定随机变量 $Y$ 下 $X$ 的条件熵 Conditional Entropy（或条件不确定性，也可称为 $X$ 关于 $Y$ 的含糊度)）是 $Y$ 上的平均条件熵: ^[10]

[math]\displaystyle{ H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y). }[/math]

Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

由于熵能够以随机变量或该随机变量的某个值为条件，所以应注意不要混淆条件熵的这两个定义（前者更为常用）。该类条件熵的一个基本属性为:

[math]\displaystyle{ H(X|Y) = H(X,Y) - H(Y) .\, }[/math]

互信息(转移信息)

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of $X$ relative to $Y$ is given by:

Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by:

互信息 Mutual Information度量的是某个随机变量在通过观察另一个随机变量时可以获得的信息量。在通信中可以用它来最大化发送和接收信号之间共享的信息量，这一点至关重要。 $X$ 相对于 $Y$ 的互信息由以下公式给出:

[math]\displaystyle{ I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)} }[/math]

where $SI$ (Specific mutual Information) is the pointwise mutual information.

where (Specific mutual Information) is the pointwise mutual information.

其中 $SI$ (Specific mutual Information，特定互信息)是点间的互信息。

A basic property of the mutual information is that

互信息的一个基本属性是：

[math]\displaystyle{ I(X;Y) = H(X) - H(X|Y).\, }[/math]

That is, knowing Y, we can save an average of $I (X; Y)$ bits in encoding X compared to not knowing Y.

That is, knowing Y, we can save an average of bits in encoding X compared to not knowing Y.

也就是说，在编码X的过程中，知道Y比不知道Y平均节省 $I (X; Y)$ 比特。

Mutual information is symmetric:

互信息是对称的:

[math]\displaystyle{ I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\, }[/math]

Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:

互信息可以表示为在给定Y值的情况下X的后验分布，以及X的先验概率分布之间的平均 Kullback-Leibler 散度(信息增益) :

[math]\displaystyle{ I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )]. }[/math]

In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:

换句话说，这个指标度量：当我们给出Y的值，得出X上的概率分布将会平均变化多少。这通常用于计算边缘分布的乘积与实际联合分布的差异：

[math]\displaystyle{ I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)). }[/math]

Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ² test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

互信息与列联表中的似然比检验，多项分布，以及皮尔森卡方检验密切相关: 互信息可以视为评估一对变量之间独立性的统计量，并且具有明确指定的渐近分布。

Kullback-Leibler 散度(信息增益)

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

Kullback-Leibler 散度(或信息散度、相对熵、信息增益)是比较两种分布的方法: “真实的”概率分布p(X)和任意概率分布q(X)。若假设q(X)是基于某种方式压缩的数据的分布，而实际上p(X)才是真正分布，那么 Kullback-Leibler 散度是每个数据压缩所需的平均额外比特数。因此定义：

[math]\displaystyle{ D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}. }[/math]

Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).

尽管有时会将KL散度用作距离量度但它并不是一个真正的指标，因为它是不对称的，同时也不满足三角不等式(KL散度可以作为一个半准度量)。

Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.

Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.

KL散度的另一种解释是一种先验知识引入的“不必要的惊讶”。假设将从概率分布为“ p（x）”的离散集合中随机抽取数字“ X”，如果Alice知道真实的分布“p（x）”，而Bob（因为具有先验知识）认为概率分布是“q（x）”，那么在看到抽取出来的X的值后，平均而言，Bob将比Alice更加惊讶。KL散度就是Bob惊讶的期望值减去Alice惊讶的期望值（如果对数以2为底，则以比特为单位），这样Bob所拥有的先验知识的“错误的”程度可以用他“不必要的惊讶”的期望值来进行量化。

其他度量

Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.

信息论中其他重要的量包括瑞丽熵 Rényi Entropy(一种熵的推广)，微分熵(信息量推广到连续分布)，以及条件互信息。

编码理论

A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using error detection and correction.

在可读CD的表面上显示划痕的图片。音乐和数据CD使用纠错编码进行编码，因此即使它们有轻微的划痕，也可以通过错误检测和纠正来对CD进行读取。

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

编码理论 Coding Theory是信息论最重要、最直接的应用之一，可以细分为信源编码理论 Source Coding Theory和信道编码理论 Channel Coding Theory。信息论使用统计学来量化描述数据所需的比特数，也就是源的信息熵。

Data compression (source coding): There are two formulations for the compression problem:
数据压缩（源编码）：压缩问题有两个相关公式；

lossless data compression: the data must be reconstructed exactly;
[[[无损数据压缩]]：数据必须准确重构；

lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called rate–distortion theory.
有损数据压缩：由失真函数测得的在指定保真度级别内分配重构数据所需的比特数。信息论中的这个部分称为率失真理论。

Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.
纠错码（信道编码）：数据压缩会尽可能多的消除冗余，而纠错码会添加所需的冗余（即纠错），以便在嘈杂的信道上有效且保真地传输数据。

This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.

信息传输定理，或着说“信源-信道分离定理”证明，编码理论应当划分为压缩和传输两部分。定理证明了在许多情况下使用比特作为信息的通用货币是合理的，但这只在发送用户与特定接收用户建立通信的情况下才成立。在具有多个发送器（多路访问信道），多个接收器（广播信道）或中转器（中继信道）或多个计算机网络的情况下，压缩后再进行传输可能就不再是最佳选择。网络信息论指的就是这些多主体通信模型。

信源理论

Any process that generates successive messages can be considered a 模板:Em of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

Any process that generates successive messages can be considered a of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

生成连续消息的任何过程都可以视为信息的通讯来源。无记忆信源是指每个消息都是独立同分布的随机变量，而遍历理论和平稳过程的性质对信源施加的限制较少。所有这些信源都可以看作随机的。在信息论领域外，这些术语也已经有很全面的相关研究。

速率

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

信息速率 Information Rate（熵率）是每个符号的平均熵。对于无记忆信源，信息速率仅表示每个符号的熵，而在平稳随机过程中，它是：

[math]\displaystyle{ r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots); }[/math]

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

也就是，给定所有之前生成的符号下，一个符号的条件熵。对于非平稳的过程的更一般情况，平均速率为：

[math]\displaystyle{ r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n); }[/math]

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.^[11]

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.

也就是每个符号的联合熵的极限。对于平稳源，这两个表达式得出的结果相同。^[12]

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of 模板:Em.

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .

在信息论中谈论一种语言的“速率”或“熵”是很常见的，也是很合适的，比如当信源是英文散文时。信息源的速率与其冗余度以及可被压缩程度有关。

信道容量

Communications over a channel—such as an ethernet cable—is the primary motivation of information theory. However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.

通过信道(例如，以太网电缆)进行通信是信息论的主要动机。然而，这样的信道往往不能产生信号的精确重建; 静默时段内、噪声、其他形式的信号损坏往往会使得信息质量的降低。

Consider the communications process over a discrete channel. A simple model of the process is shown below:

考虑离散信道上的通信过程。该过程的简单模型如下:

Channel model

[[File:Channel model.svg|center|800px|信道模型]

Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let $p (y | x)$ be the conditional probability distribution function of Y given X. We will consider $p (y | x)$ to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of $f (x)$ , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the 模板:Em and is given by:

Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let x)}} be the conditional probability distribution function of Y given X. We will consider x)}} to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by:

这里X表示要发送的信息的空间（全集），Y表示单位时间内通过信道接收的信息的空间。设 $p (y | x)$ 是给定X的Y的条件概率分布函数。我们将 $p (y | x)$ 视为通信信道的固定属性(表示信道噪声的性质)。那么X和Y的联合分布完全取决于所选用的信道和 $f (x)$ ，以及通过信道发送的信息的边缘分布。在这些约束条件下，我们希望最大化信息速率或信号速率，可以通过信道进行通信。对此的适当度量为互信息，信道容量即为最大互信息，且由下式给出:

[math]\displaystyle{ C = \max_{f} I(X;Y).\! }[/math]

This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

信道容量具有以下与以信息速率“R”进行通信有关的属性(其中“R”通常为每个符号的比特数)。对于任意信息速率R < C和编码错误ε > 0，存在足够大的长度为N和速率大于等于R的代码以及解码算法使得块错误的最大概率小于等于ε；即总是可以在任意小的块错误下进行传输。此外对于任何速率的“ R> C”，不可能以很小的块错误进行发送。

Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

信道编码就寻找一种接近最优的编码，它可以用于在噪声信道上以接近信道容量的速率传输数据，且编码错误很小。

特定信道容量模型

A continuous-time analog communications channel subject to Gaussian noise — see Shannon–Hartley theorem.

连续时间内受高斯噪声（Gaussian noise）限制的模拟通信信道（详细内容请参见Shannon–Hartley定理）。

A binary symmetric channel (BSC) with crossover probability p is a binary input, binary output channel that flips the input bit with probability p. The BSC has a capacity of $1 - H b (p)$ bits per channel use, where $H b$ is the binary entropy function to the base-2 logarithm:

二进制对称通道（binary symmetric channel，BSC）是交叉概率为p的二进制输入、二进制输出（以概率p翻转输入位）通道。每个通道使用的BSC容量为 $1 - H b (p)$ 比特，其中 $H b$ 是以2为底的对数的二进制熵函数：

File:Binary symmetric channel.svg

A binary erasure channel (BEC) with erasure probability p is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is 1 − p bits per channel use.
二进制擦除通道（binary erasure channel，BEC）是擦除概率为“ p”的二进制输入、三进制输出通道。可能的通道输出为0、1和擦除符号'e'。擦除表示信息输入位的完全丢失。每个通道使用的BEC容量为1 − p比特。

File:Binary erasure channel.svg

在其他领域的应用

情报使用和安全应用

Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.

信息论的概念可以应用于密码学和密码分析。在Ultra的项目中就使用了图灵的信息单位 ban，破解了德国的恩尼格玛密码，加速了二战在欧洲的结束。香农定义了一个重要的概念，现在称为单一性距离（unicity distance），基于明文的冗余性尝试给出具有唯一可解密性所需的最少量的密文。

Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.

信息论使我们觉得保密比最初看起来要困难得多。穷举法也可以破解基于非对称密钥算法或最常用的对称密钥算法(也称为密钥算法)，如分块加密。所有这些方法的安全性都来自以下假设：在一定的的时间内没有已知的攻击方法可以破解它们。

Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.

信息理论安全性指的是诸如一次性密钥之类的不易受到这种暴力攻击的方法。在这种情况下，可以确保明文和密文(以密钥为条件)之间的正条件互信息正确的传输，而明文和密文之间的无条件互信息仍为零，从而保证绝对安全的通信。换句话说，窃听者将无法通过获取密文而不是密钥的知识来改善其对原文本的猜测。但是，就像在其他任何密码系统中一样，即便时信息论中安全的方法必须小心正确的使用; 之所以Venona 项目能够破解苏联的一次性密钥，就是因为苏联不当地重复使用关键材料。

伪随机数的生成

Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.

伪随机数生成器在计算机语言库和应用程序中广泛应用。由于它们没有规避现代计算机设备和软件的确定性，因此普遍不适合用在密码学中。一类改进的随机数生成器称为加密安全的伪随机数生成器，但也需要软件外部的随机种子才能正常工作，这通过提取器来获得。用来度量提取器中充分随机性的概念是最小熵，该值通过瑞丽熵与香农熵关联；瑞丽熵还用于评估密码系统中的随机性。虽然相关，但具有较高香农熵的随机变量不一定适合在提取器中使用，因此也不能用在密码学中。

地震勘探

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.^[13]

One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.

信息论的一个早期商业应用是在地震石油勘探领域。在该领域的应用可以从期望的地震信号中剔除和分离不需要的噪声。与以前的模拟方法相比，信息论和数字信号处理大大提高了图像的分辨率和清晰度。^[14]

符号学

Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics.^[15]^:171^[16]^:137 Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."^[15]^:91

Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."

符号学家Doede Nauta和Winfried Nöth都认为Charles Sanders Peirce在他的符号学著作中创造了信息论。^[15]^:171^[16]^:137 Nauta将符号信息论定义为研究编码、过滤和信息处理的内部过程。^[15]^:91

Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.^[17]

Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.

信息论的概念（例如冗余和代码控制）已被符号学家如Umberto Eco和Ferruccio Rossi-Landi用来解释意识形态，将其作为消息传输的一种形式，占统治地位的社会阶层通过使用具有高度冗余性的标志来发出其信息，使得从符号中解码出来的消息只有一种，而不会时其他可能的消息。

其他应用

Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.

信息论在赌博、黑洞信息悖论和生物信息学中也有应用。

参考资料

↑ ^1.0 ^1.1 F. Rieke; D. Warland; R Ruyter van Steveninck; W Bialek (1997). Spikes: Exploring the Neural Code. The MIT press. ISBN 978-0262681087.
↑ Delgado-Bonal, Alfonso; Martín-Torres, Javier (2016-11-03). "Human vision is determined based on information theory". Scientific Reports (in English). 6 (1): 36038. Bibcode:2016NatSR...636038D. doi:10.1038/srep36038. ISSN 2045-2322. PMC 5093619. PMID 27808236.
↑ cf; Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. (2001). "Bayesian inference of phylogeny and its impact on evolutionary biology". Science. 294 (5550): 2310–2314. Bibcode:2001Sci...294.2310H. doi:10.1126/science.1065889. PMID 11743192. S2CID 2138288.
↑ Allikmets, Rando; Wasserman, Wyeth W.; Hutchinson, Amy; Smallwood, Philip; Nathans, Jeremy; Rogan, Peter K. (1998). "Thomas D. Schneider], Michael Dean (1998) Organization of the ABCR gene: analysis of promoter and splice junction sequences". Gene. 215 (1): 111–122. doi:10.1016/s0378-1119(98)00269-8. PMID 9666097.
↑ Burnham, K. P. and Anderson D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Second Edition (Springer Science, New York) .
↑ Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics". Phys. Rev. 106 (4): 620. Bibcode:1957PhRv..106..620J. doi:10.1103/physrev.106.620.
↑ Bennett, Charles H.; Li, Ming; Ma, Bin (2003). "Chain Letters and Evolutionary Histories". Scientific American. 288 (6): 76–81. Bibcode:2003SciAm.288f..76B. doi:10.1038/scientificamerican0603-76. PMID 12764940. Archived from the original on 2007-10-07. Retrieved 2008-03-11.
↑ David R. Anderson (November 1, 2003). "Some background on why people in the empirical sciences may want to better understand the information-theoretic methods" (PDF). Archived from the original (PDF) on July 23, 2011. Retrieved 2010-06-23.
↑ ^9.0 ^9.1 Fazlollah M. Reza (1994) [1961]. An Introduction to Information Theory. Dover Publications, Inc., New York. ISBN 0-486-68210-2. https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22.
↑ ^10.0 ^10.1 Robert B. Ash (1990) [1965]. Information Theory. Dover Publications, Inc.. ISBN 0-486-66521-6. https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty.
↑ Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1.
↑ Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1.
↑ Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.
↑ Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.
↑ ^15.0 ^15.1 ^15.2 ^15.3 Nauta, Doede (1972). The Meaning of Information. The Hague: Mouton. ISBN 9789027919960.
↑ ^16.0 ^16.1 Nöth, Winfried (January 2012). "Charles S. Peirce's theory of information: a theory of the growth of symbols and of knowledge". Cybernetics and Human Knowing. 19 (1–2): 137–161. {{cite journal}}: Invalid |ref=harv (help)
↑ Nöth, Winfried (1981). "Semiotics of ideology". Semiotica, Issue 148.

经典之作

Shannon, C.E. (1948), "A Mathematical Theory of Communication", Bell System Technical Journal, 27, pp. 379–423 & 623–656, July & October, 1948. PDF.
Notes and other formats.

R.V.L. Hartley, "Transmission of Information", Bell System Technical Journal, July 1928

Andrey Kolmogorov (1968), "Three approaches to the quantitative definition of information" in International Journal of Computer Mathematics.

其他期刊文章

J. L. Kelly, Jr., Betbubbles.com https://en.wikipedia.org/wiki/Defekte_Weblinks?dwl={{{url}}} Seite nicht mehr abrufbar], Suche in Webarchiven: Kategorie:Wikipedia:Weblink offline (andere Namensräume)[http://timetravel.mementoweb.org/list/2010/Kategorie:Wikipedia:Vorlagenfehler/Vorlage:Toter Link/URL_fehlt , "A New Interpretation of Information Rate" Bell System Technical Journal, Vol. 35, July 1956, pp. 917–26.

R. Landauer, IEEE.org, "Information is Physical" Proc. Workshop on Physics and Computation PhysComp'92 (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4.

Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process" (PDF). IBM J. Res. Dev. 5 (3): 183–191. doi:10.1147/rd.53.0183.

Timme, Nicholas; Alford, Wesley; Flecker, Benjamin; Beggs, John M. (2012). "Multivariate information measures: an experimentalist's perspective". arXiv:1111.6857 [cs.IT].

信息论教材

Arndt, C. Information Measures, Information and its Description in Science and Engineering (Springer Series: Signals and Communication Technology), 2004,
- Ash, RB. Information Theory. New York: Interscience, 1965. . New York: Dover 1990.
  - Gallager, R. Information Theory and Reliable Communication. New York: John Wiley and Sons, 1968.
    - Goldman, S. Information Theory. New York: Prentice Hall, 1953. New York: Dover 1968 , 2005
      - Cover, Thomas; Thomas, Joy A. (2006). Elements of information theory (2nd ed.). New York: Wiley-Interscience. ISBN 0-471-24195-4.
      - Csiszar, I, Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems Akademiai Kiado: 2nd edition, 1997.
        MacKay, David J. C.. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003.
        Mansuripur, M. Introduction to Information Theory. New York: Prentice Hall, 1987.
        McEliece, R. The Theory of Information and Coding". Cambridge, 2002.
        Pierce, JR. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980).
        
        Reza, F. An Introduction to Information Theory. New York: McGraw-Hill 1961. New York: Dover 1994.
        Shannon, Claude; Weaver, Warren (1949). The Mathematical Theory of Communication. Urbana, Illinois: University of Illinois Press. ISBN 0-252-72548-4. LCCN 49-11922. http://monoskop.org/images/b/be/Shannon_Claude_E_Weaver_Warren_The_Mathematical_Theory_of_Communication_1963.pdf.
        
        Stone, JV. Chapter 1 of book "Information Theory: A Tutorial Introduction", University of Sheffield, England, 2014. .
        Yeung, RW. A First Course in Information Theory Kluwer Academic/Plenum Publishers, 2002. .
        Yeung, RW. Information Theory and Network Coding Springer 2008, 2002.

其他书籍

Leon Brillouin, Science and Information Theory, Mineola, N.Y.: Dover, [1956, 1962] 2004.
James Gleick, The Information: A History, a Theory, a Flood, New York: Pantheon, 2011.
A. I. Khinchin, Mathematical Foundations of Information Theory, New York: Dover, 1957.
H. S. Leff and A. F. Rex, Editors, Maxwell's Demon: Entropy, Information, Computing, Princeton University Press, Princeton, New Jersey (1990).
Robert K. Logan. What is Information? - Propagating Organization in the Biosphere, the Symbolosphere, the Technosphere and the Econosphere, Toronto: DEMO Publishing.

Tom Siegfried, The Bit and the Pendulum, Wiley, 2000.
Charles Seife, Decoding the Universe, Viking, 2006.
Jeremy Campbell, Grammatical Man, Touchstone/Simon & Schuster, 1982,
Henri Theil, Economics and Information Theory, Rand McNally & Company - Chicago, 1967.

Escolano, Suau, Bonev, Information Theory in Computer Vision and Pattern Recognition, Springer, 2009.
Vlatko Vedral, Decoding Reality: The Universe as Quantum Information, Oxford University Press 2010.

信息论大型开放式课程
Raymond W. Yeung, "Information Theory" (The Chinese University of Hong Kong)

外部链接

模板:Wikiquote

模板:Library resources box

模板:SpringerEOM
Lambert F. L. (1999), "Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms - Examples of Entropy Increase? Nonsense!", Journal of Chemical Education
IEEE Information Theory Society and ITSOC Monographs, Surveys, and Reviews

模板:Cybernetics 模板:Compression methods 模板:Areas of mathematics 模板:Computer science

Category:Computer science

类别: 计算机科学

Category:Cybernetics

类别: 控制论

Category:Formal sciences

类别: 正规科学

Category:Information Age

类别: 信息时代

This page was moved from wikipedia:en:Information theory. Its edit history can be viewed at 信息论/edithistory

[Spikes-1] 1.0 ^1.1 F. Rieke; D. Warland; R Ruyter van Steveninck; W Bialek (1997). Spikes: Exploring the Neural Code. The MIT press. ISBN 978-0262681087.

[2] Delgado-Bonal, Alfonso; Martín-Torres, Javier (2016-11-03). "Human vision is determined based on information theory". Scientific Reports (in English). 6 (1): 36038. Bibcode:2016NatSR...636038D. doi:10.1038/srep36038. ISSN 2045-2322. PMC 5093619. PMID 27808236.

[3] ; Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. (2001). "Bayesian inference of phylogeny and its impact on evolutionary biology". Science. 294 (5550): 2310–2314. Bibcode:2001Sci...294.2310H. doi:10.1126/science.1065889. PMID 11743192. S2CID 2138288.

[4] Allikmets, Rando; Wasserman, Wyeth W.; Hutchinson, Amy; Smallwood, Philip; Nathans, Jeremy; Rogan, Peter K. (1998). "Thomas D. Schneider], Michael Dean (1998) Organization of the ABCR gene: analysis of promoter and splice junction sequences". Gene. 215 (1): 111–122. doi:10.1016/s0378-1119(98)00269-8. PMID 9666097.

[5] Burnham, K. P. and Anderson D. R. (2002) Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Second Edition (Springer Science, New York) .

[6] Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics". Phys. Rev. 106 (4): 620. Bibcode:1957PhRv..106..620J. doi:10.1103/physrev.106.620.

[7] Bennett, Charles H.; Li, Ming; Ma, Bin (2003). "Chain Letters and Evolutionary Histories". Scientific American. 288 (6): 76–81. Bibcode:2003SciAm.288f..76B. doi:10.1038/scientificamerican0603-76. PMID 12764940. Archived from the original on 2007-10-07. Retrieved 2008-03-11.

[8] David R. Anderson (November 1, 2003). "Some background on why people in the empirical sciences may want to better understand the information-theoretic methods" (PDF). Archived from the original (PDF) on July 23, 2011. Retrieved 2010-06-23.

[Reza-9] 9.0 ^9.1 Fazlollah M. Reza (1994) [1961]. An Introduction to Information Theory. Dover Publications, Inc., New York. ISBN 0-486-68210-2. https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22.

[Ash-10] 10.0 ^10.1 Robert B. Ash (1990) [1965]. Information Theory. Dover Publications, Inc.. ISBN 0-486-66521-6. https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty.

[11] Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1.

[12] Jerry D. Gibson (1998). Digital Compression for Multimedia: Principles and Standards. Morgan Kaufmann. ISBN 1-55860-369-7. https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1.

[13] Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.

[14] Haggerty, Patrick E. (1981). "The corporation and innovation". Strategic Management Journal. 2 (2): 97–118. doi:10.1002/smj.4250020202.

[Nauta_1972-15] 15.0 ^15.1 ^15.2 ^15.3 Nauta, Doede (1972). The Meaning of Information. The Hague: Mouton. ISBN 9789027919960.

[N.C3.B6th_2012-16] 16.0 ^16.1 Nöth, Winfried (January 2012). "Charles S. Peirce's theory of information: a theory of the growth of symbols and of knowledge". Cybernetics and Human Knowing. 19 (1–2): 137–161. {{cite journal}}: Invalid |ref=harv (help)

[17] Nöth, Winfried (1981). "Semiotics of ideology". Semiotica, Issue 148.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]