添加90,397字节
、 2020年5月6日 (三) 14:46
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。{{distinguish|Information science}}
{{Information theory}}
'''Information theory''' studies the [[quantification (science)|quantification]], [[computer data storage|storage]], and [[telecommunication|communication]] of [[information]]. It was originally proposed by [[Claude Shannon]] in 1948 to find fundamental limits on [[signal processing]] and communication operations such as [[data compression]], in a landmark paper titled "[[A Mathematical Theory of Communication]]". Its impact has been crucial to the success of the [[Voyager program|Voyager]] missions to deep space, the invention of the [[compact disc]], the feasibility of mobile phones, the development of the Internet, the study of [[linguistics]] and of human perception, the understanding of [[black hole]]s, and numerous other fields.
Information theory studies the quantification, storage, and communication of information. It was originally proposed by Claude Shannon in 1948 to find fundamental limits on signal processing and communication operations such as data compression, in a landmark paper titled "A Mathematical Theory of Communication". Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields.
信息理论研究信息的量化、存储和传播。它最初是由 Claude Shannon 在1948年提出的,目的是找到信号处理和通信操作的基本限制,比如数据压缩,在一篇题为《通信的数学理论》的里程碑式的论文中。它的影响对于旅行者号深空探测任务的成功、光盘的发明、移动电话的可行性、互联网的发展、语言学和人类感知的研究、对黑洞的理解以及许多其他领域都是至关重要的。
The field is at the intersection of mathematics, [[statistics]], computer science, physics, [[Neuroscience|neurobiology]], [[information engineering (field)|information engineering]], and electrical engineering. The theory has also found applications in other areas, including [[statistical inference]], [[natural language processing]], [[cryptography]], [[neurobiology]],<ref name="Spikes">{{cite book|title=Spikes: Exploring the Neural Code|author1=F. Rieke|author2=D. Warland|author3=R Ruyter van Steveninck|author4=W Bialek|publisher=The MIT press|year=1997|isbn=978-0262681087}}</ref> [[human vision]],<ref>{{Cite journal|last=Delgado-Bonal|first=Alfonso|last2=Martín-Torres|first2=Javier|date=2016-11-03|title=Human vision is determined based on information theory|journal=Scientific Reports|language=En|volume=6|issue=1|pages=36038|bibcode=2016NatSR...636038D|doi=10.1038/srep36038|issn=2045-2322|pmc=5093619|pmid=27808236}}</ref> the evolution<ref>{{cite journal|last1=cf|last2=Huelsenbeck|first2=J. P.|last3=Ronquist|first3=F.|last4=Nielsen|first4=R.|last5=Bollback|first5=J. P.|year=2001|title=Bayesian inference of phylogeny and its impact on evolutionary biology|url=|journal=Science|volume=294|issue=5550|pages=2310–2314|bibcode=2001Sci...294.2310H|doi=10.1126/science.1065889|pmid=11743192}}</ref> and function<ref>{{cite journal|last1=Allikmets|first1=Rando|last2=Wasserman|first2=Wyeth W.|last3=Hutchinson|first3=Amy|last4=Smallwood|first4=Philip|last5=Nathans|first5=Jeremy|last6=Rogan|first6=Peter K.|year=1998|title=Thomas D. Schneider], Michael Dean (1998) Organization of the ABCR gene: analysis of promoter and splice junction sequences|url=http://alum.mit.edu/www/toms/|journal=Gene|volume=215|issue=1|pages=111–122|doi=10.1016/s0378-1119(98)00269-8|pmid=9666097}}</ref> of molecular codes ([[bioinformatics]]), [[model selection]] in statistics,<ref>Burnham, K. P. and Anderson D. R. (2002) ''Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Second Edition'' (Springer Science, New York) {{ISBN|978-0-387-95364-9}}.</ref> [[thermal physics]],<ref>{{cite journal|last1=Jaynes|first1=E. T.|year=1957|title=Information Theory and Statistical Mechanics|url=http://bayes.wustl.edu/|journal=Phys. Rev.|volume=106|issue=4|page=620|bibcode=1957PhRv..106..620J|doi=10.1103/physrev.106.620}}</ref> [[quantum computing]], linguistics, [[plagiarism detection]],<ref>{{cite journal|last1=Bennett|first1=Charles H.|last2=Li|first2=Ming|last3=Ma|first3=Bin|year=2003|title=Chain Letters and Evolutionary Histories|url=http://sciamdigital.com/index.cfm?fa=Products.ViewIssuePreview&ARTICLEID_CHAR=08B64096-0772-4904-9D48227D5C9FAC75|journal=Scientific American|volume=288|issue=6|pages=76–81|bibcode=2003SciAm.288f..76B|doi=10.1038/scientificamerican0603-76|pmid=12764940|access-date=2008-03-11|archive-url=https://web.archive.org/web/20071007041539/http://www.sciamdigital.com/index.cfm?fa=Products.ViewIssuePreview&ARTICLEID_CHAR=08B64096-0772-4904-9D48227D5C9FAC75|archive-date=2007-10-07|url-status=dead}}</ref> [[pattern recognition]], and [[anomaly detection]].<ref>{{Cite web|url=http://aicanderson2.home.comcast.net/~aicanderson2/home.pdf|title=Some background on why people in the empirical sciences may want to better understand the information-theoretic methods|author=David R. Anderson|date=November 1, 2003|archiveurl=https://web.archive.org/web/20110723045720/http://aicanderson2.home.comcast.net/~aicanderson2/home.pdf|archivedate=July 23, 2011|url-status=dead|accessdate=2010-06-23}}
The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, information engineering, and electrical engineering. The theory has also found applications in other areas, including statistical inference, natural language processing, cryptography, neurobiology, human vision, the evolution and function of molecular codes (bioinformatics), model selection in statistics, thermal physics, quantum computing, linguistics, plagiarism detection, pattern recognition, and anomaly detection.<ref>
该领域是数学、统计学、计算机科学、物理学、神经生物学、信息工程和电气工程的交叉学科。这一理论也在其他领域得到了应用,包括推论统计学、自然语言处理、密码学、神经生物学、人类视觉、分子编码的进化和功能(生物信息学)、统计学中的模型选择、热物理学、量子计算、语言学、剽窃检测、模式识别和异常检测。 裁判
</ref> Important sub-fields of information theory include [[source coding]], [[algorithmic complexity theory]], [[algorithmic information theory]], [[information-theoretic security]], [[Grey system theory]] and measures of information.
</ref> Important sub-fields of information theory include source coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, Grey system theory and measures of information.
信息论的重要分支包括信源编码、算法复杂性理论、算法信息论、资讯理论安全性、灰色系统理论和信息度量。
Applications of fundamental topics of information theory include [[lossless data compression]] (e.g. [[ZIP (file format)|ZIP files]]), [[lossy data compression]] (e.g. [[MP3]]s and [[JPEG]]s), and [[channel capacity|channel coding]] (e.g. for [[digital subscriber line|DSL]]). Information theory is used in [[information retrieval]], [[intelligence (information gathering)|intelligence gathering]], gambling, and even in musical composition.
Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s and JPEGs), and channel coding (e.g. for DSL). Information theory is used in information retrieval, intelligence gathering, gambling, and even in musical composition.
信息论基本主题的应用包括无损耗数据压缩(例如:。Zip 档案)、有损数据压缩(例如:。Mp3和 jpeg 格式) ,以及频道编码(例如:。用于 DSL)。信息理论应用于信息检索、情报收集、赌博,甚至应用于音乐创作。
A key measure in information theory is [[information entropy|entropy]]. Entropy quantifies the amount of uncertainty involved in the value of a [[random variable]] or the outcome of a [[random process]]. For example, identifying the outcome of a fair [[coin flip]] (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a {{dice}} (with six equally likely outcomes). Some other important measures in information theory are [[mutual information]], channel capacity, [[error exponent]]s, and [[relative entropy]].
A key measure in information theory is entropy. Entropy quantifies the amount of uncertainty involved in the value of a random variable or the outcome of a random process. For example, identifying the outcome of a fair coin flip (with two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a (with six equally likely outcomes). Some other important measures in information theory are mutual information, channel capacity, error exponents, and relative entropy.
信息论中的一个关键度量是熵。熵量化了一个随机变量的值或者一个随机过程的结果所包含的不确定性。例如,识别一次公平抛硬币的结果(有两个同样可能的结果)所提供的信息(较低的熵)少于指定一卷 a 的结果(有六个同样可能的结果)。信息论中的其他一些重要指标是互信息、信道容量、误差指数和相对熵。
==Overview==
==Overview==
概览
Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the [[noisy-channel coding theorem]] showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.<ref name="Spikes" />
Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise. Shannon's main result, the noisy-channel coding theorem showed that, in the limit of many channel uses, the rate of information that is asymptotically achievable is equal to the channel capacity, a quantity dependent merely on the statistics of the channel over which the messages are sent.
信息理论研究信息的传递、处理、提取和利用。抽象地说,信息可以被认为是不确定性的解决办法。1948年,克劳德 · 香农在他的论文《通信的数学理论》中将这个抽象的概念具体化,在这篇论文中,“信息”被认为是一组可能的信息,其目标是通过噪声信道发送这些信息,然后让接收者在信道噪声的影响下以低错误概率重构信息。香农的主要结果---- 有噪信道编码定理表明,在许多信道使用的限制下,渐近可达到的信息速率等于信道容量,这个数量仅仅依赖于信息发送所经过的信道的统计信息。
Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of [[Rubric (academic)|rubrics]] throughout the world over the past half century or more: [[adaptive system]]s, [[anticipatory system]]s, [[artificial intelligence]], [[complex system]]s, [[complexity science]], [[cybernetics]], [[Informatics (academic field)|informatics]], [[machine learning]], along with [[systems science]]s of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of [[coding theory]].
Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics throughout the world over the past half century or more: adaptive systems, anticipatory systems, artificial intelligence, complex systems, complexity science, cybernetics, informatics, machine learning, along with systems sciences of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory.
信息理论与一系列纯粹的和应用的学科密切相关,这些学科在过去半个多世纪以来在世界各地的各种规则下被研究和简化为工程实践: 自适应系统,预期系统,人工智能,复杂系统,复杂性科学,控制论,信息学,机器学习,以及许多描述的系统科学。信息论是一个广泛而深入的数学理论,具有同样广泛而深入的应用,其中是编码理论的重要领域。
Coding theory is concerned with finding explicit methods, called ''codes'', for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and [[error-correction]] (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.
Coding theory is concerned with finding explicit methods, called codes, for increasing the efficiency and reducing the error rate of data communication over noisy channels to near the channel capacity. These codes can be roughly subdivided into data compression (source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible.
编码理论是关于寻找明确的方法,称为编码,以提高效率和降低错误率的数据通信在噪声信道附近的信道容量。这些编码大致可以细分为数据压缩编码(信源编码)和纠错(信道编码)技术。在后一种情况下,花了很多年才发现香农的工作证明的方法是可行的。
A third class of information theory codes are cryptographic algorithms (both [[code (cryptography)|code]]s and [[cipher]]s). Concepts, methods and results from coding theory and information theory are widely used in cryptography and [[cryptanalysis]]. ''See the article [[ban (unit)]] for a historical application.''
A third class of information theory codes are cryptographic algorithms (both codes and ciphers). Concepts, methods and results from coding theory and information theory are widely used in cryptography and cryptanalysis. See the article ban (unit) for a historical application.
第三类信息论代码是密码算法(包括代码和密码)。编码理论和信息论的概念、方法和结果在密码学和密码分析中得到了广泛的应用。参见文章禁止(单位)的历史应用。
==Historical background==
==Historical background==
历史背景
{{Main|History of information theory}}
The landmark event that ''established'' the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the ''[[Bell System Technical Journal]]'' in July and October 1948.
The landmark event that established the discipline of information theory and brought it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948.
1948年7月和10月,在《贝尔系统技术杂志》上发表了克劳德·香农的经典论文《交流的数学理论》 ,这是建立信息理论学科并立即引起全世界关注的里程碑事件。
Prior to this paper, limited information-theoretic ideas had been developed at [[Bell Labs]], all implicitly assuming events of equal probability. [[Harry Nyquist]]'s 1924 paper, ''Certain Factors Affecting Telegraph Speed'', contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation {{math|1=''W'' = ''K'' log ''m''}} (recalling [[Boltzmann's constant]]), where ''W'' is the speed of transmission of intelligence, ''m'' is the number of different voltage levels to choose from at each time step, and ''K'' is a constant. [[Ralph Hartley]]'s 1928 paper, ''Transmission of Information'', uses the word ''information'' as a measurable quantity, reflecting the receiver's ability to distinguish one [[sequence of symbols]] from any other, thus quantifying information as {{math|1=''H'' = log ''S''<sup>''n''</sup> = ''n'' log ''S''}}, where ''S'' was the number of possible symbols, and ''n'' the number of symbols in a transmission. The unit of information was therefore the [[decimal digit]], which has since sometimes been called the [[Hartley (unit)|hartley]] in his honor as a unit or scale or measure of information. [[Alan Turing]] in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war [[Cryptanalysis of the Enigma|Enigma]] ciphers.
Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation (recalling Boltzmann's constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as , where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which has since sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers.
在这篇文章之前,有限的信息理论思想已经在贝尔实验室开发,所有隐含假设事件的等概率。Harry Nyquist 在1924年的论文《影响电报速度的某些因素》中有一个理论部分,量化了“智能”和通信系统可以传输的“线路速度” ,给出了关系(回顾 Boltzmann 常数) ,其中 w 是智能传输的速度,m 是每个时间步骤可以选择的不同电压级数,k 是常数。拉尔夫 · 哈特利在1928年发表的论文《信息的传递》中,将单词信息作为一个可测量的量,反映了接收者区分一系列符号的能力,从而将信息量化为,其中 s 是可能符号的数量,n 是传输中符号的数量。因此,信息的单位就是十进制数字,为了表示对他的尊敬,这个单位有时被称为哈特莱,作为信息的单位、尺度或度量。1940年,阿兰 · 图灵在对德国二战时期破解英格玛密码的统计分析中使用了类似的想法。
Much of the mathematics behind information theory with events of different probabilities were developed for the field of [[thermodynamics]] by [[Ludwig Boltzmann]] and [[J. Willard Gibbs]]. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by [[Rolf Landauer]] in the 1960s, are explored in ''[[Entropy in thermodynamics and information theory]]''.
Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory.
信息论背后的许多数学理论,包括不同概率的事件,都是由路德维希·玻尔兹曼和 j. Willard Gibbs 为热力学领域而发展起来的。信息论熵和熵之间的联系,包括 Rolf Landauer 在20世纪60年代的重要贡献,在《西拉德发动机志中进行了探讨。
In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that
In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that
1944年底,贝尔实验室基本完成了香农具有革命性和开创性的论文
:"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."
"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."
通信的基本问题是在一点上精确地或近似地再现在另一点上选择的信息
With it came the ideas of
With it came the ideas of
随之而来的是
* the information entropy and [[redundancy (information theory)|redundancy]] of a source, and its relevance through the [[source coding theorem]];
* the mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
* the practical result of the [[Shannon–Hartley law]] for the channel capacity of a [[Gaussian channel]]; as well as
* the [[bit]]—a new way of seeing the most fundamental unit of information.
==Quantities of information==
==Quantities of information==
资料数量
{{Main|Quantities of information}}
Information theory is based on [[probability theory]] and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy [[Communication channel|channel]] in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
Information theory is based on probability theory and statistics. Information theory often concerns itself with measures of information of the distributions associated with random variables. Important quantities of information are entropy, a measure of information in a single random variable, and mutual information, a measure of information in common between two random variables. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables, and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution.
信息理论是基于概率论和统计学的。信息论常常涉及到与随机变量相关的分布信息的度量。重要的信息量是熵和互信息,前者是单一随机变量中信息的度量,后者是两个随机变量之间信息的度量。前者是一个随机变量的概率分布的性质,并且给出了一个限制,在给定分布的独立样本生成的数据可以被可靠压缩的速率。后者是两个随机变量联合分布的一个性质,是当信道统计量由联合分布确定时,在长块长度的限制下通过噪声信道的可靠通信的最大速率。
The choice of logarithmic base in the following formulae determines the [[units of measurement|unit]] of information entropy that is used. A common unit of information is the bit, based on the [[binary logarithm]]. Other units include the [[nat (unit)|nat]], which is based on the [[natural logarithm]], and the [[deciban|decimal digit]], which is based on the [[common logarithm]].
The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm.
以下公式中对数底的选择决定了使用的熵单位。一个常见的信息单位是位,基于以2爲底的对数。其他单位包括基于自然对数的 nat 和基于常用对数的十进制数字。
In what follows, an expression of the form {{math|''p'' log ''p''}} is considered by convention to be equal to zero whenever {{math|1=''p'' = 0}}. This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
In what follows, an expression of the form is considered by convention to be equal to zero whenever . This is justified because <math>\lim_{p \rightarrow 0+} p \log p = 0</math> for any logarithmic base.
在接下来的部分中,按照惯例,只要存在一个表达式,该表达式就被认为等于零。这是合理的,因为 math lim { p right tarrow 0 + } p log p0 / math 适用于任何对数底。
===Entropy of an information source===
===Entropy of an information source===
信息源的熵
Based on the [[probability mass function]] of each source symbol to be communicated, the Shannon [[Entropy (information theory)|entropy]] {{math|''H''}}, in units of bits (per symbol), is given by
Based on the probability mass function of each source symbol to be communicated, the Shannon entropy , in units of bits (per symbol), is given by
基于要通信的每个源符号的概率质量函数,香农熵以比特为单位(每个符号)由
:<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
<math>H = - \sum_{i} p_i \log_2 (p_i)</math>
数学 h- sum { i } i log 2(pi) / math
where {{math|''p<sub>i</sub>''}} is the probability of occurrence of the {{math|''i''}}-th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the [[Shannon (unit)|shannon]] in his honor. Entropy is also commonly computed using the natural logarithm (base [[E (mathematical constant)|{{mvar|e}}]], where {{mvar|e}} is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base {{nowrap|1=2<sup>8</sup> = 256}} will produce a measurement in [[byte]]s per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
where is the probability of occurrence of the -th possible value of the source symbol. This equation gives the entropy in the units of "bits" (per symbol) because it uses a logarithm of base 2, and this base-2 measure of entropy has sometimes been called the shannon in his honor. Entropy is also commonly computed using the natural logarithm (base E (mathematical constant)|, where is Euler's number), which produces a measurement of entropy in nats per symbol and sometimes simplifies the analysis by avoiding the need to include extra constants in the formulas. Other bases are also possible, but less commonly used. For example, a logarithm of base will produce a measurement in bytes per symbol, and a logarithm of base 10 will produce a measurement in decimal digits (or hartleys) per symbol.
其中是源符号的第-个可能值出现的概率。这个方程给出了以“比特”(每个符号)为单位的熵,因为它使用了以2为底的对数,而这个以2为底的熵度量有时被称为香农(shannon) ,以纪念他。熵的计算也通常使用自然对数(基数 e (数学常数) | ,其中是欧拉数) ,它产生以每个符号的 nats 为单位的熵的测量,有时通过避免在公式中包含额外常数来简化分析。其他基地也是可能的,但不常用。例如,以为底的对数将产生以每个符号的字节为单位的测量值,以10为底的对数将产生以每个符号的十进制数字(或哈特利)为单位的测量值。
Intuitively, the entropy {{math|''H<sub>X</sub>''}} of a discrete random variable {{math|''X''}} is a measure of the amount of ''uncertainty'' associated with the value of {{math|''X''}} when only its distribution is known.
Intuitively, the entropy of a discrete random variable is a measure of the amount of uncertainty associated with the value of when only its distribution is known.
直观上,离散型随机变量的熵是一种度量,当只有它的分布是已知的时候,与它的值相关的不确定性的量度。
The entropy of a source that emits a sequence of {{math|''N''}} symbols that are [[independent and identically distributed]] (iid) is {{math|''N'' ⋅ ''H''}} bits (per message of {{math|''N''}} symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length {{math|''N''}} will be less than {{math|''N'' ⋅ ''H''}}.
The entropy of a source that emits a sequence of symbols that are independent and identically distributed (iid) is bits (per message of symbols). If the source data symbols are identically distributed but not independent, the entropy of a message of length will be less than .
发出独立且同分布的符号序列(iid)的源的熵是位(每个符号消息)。如果源数据符号是同分布的,但不是独立的,则消息长度的熵将小于。
[[File:Binary entropy plot.svg|thumbnail|right|200px|The entropy of a [[Bernoulli trial]] as a function of success probability, often called the {{em|[[binary entropy function]]}}, {{math|''H''<sub>b</sub>(''p'')}}. The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.]]
The entropy of a [[Bernoulli trial as a function of success probability, often called the , . The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.]]
作为成功概率的函数,[ Bernoulli 试验的熵,通常称为,。当两种可能的结果的概率相等时,每次试验的熵最大化为1位,就像无偏的掷硬币一样。]
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages {{math|{{mset|''x''<sub>1</sub>, ..., ''x''<sub>''n''</sub>}}}} that {{math|''X''}} could be, and {{math|''p''(''x'')}} is the probability of some <math>x \in \mathbb X</math>, then the entropy, {{math|''H''}}, of {{math|''X''}} is defined:<ref name = Reza>{{cite book | title = An Introduction to Information Theory | author = Fazlollah M. Reza | publisher = Dover Publications, Inc., New York | origyear = 1961| year = 1994 | isbn = 0-486-68210-2 | url = https://books.google.com/books?id=RtzpRAiX6OgC&pg=PA8&dq=intitle:%22An+Introduction+to+Information+Theory%22++%22entropy+of+a+simple+source%22}}</ref>
If one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, it is clear that no information is transmitted. If, however, each bit is independently equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted. Between these two extremes, information can be quantified as follows. If 𝕏 is the set of all messages }} that could be, and is the probability of some <math>x \in \mathbb X</math>, then the entropy, , of is defined:
如果一个人在传输之前传输1000位(0和1) ,并且接收者知道每个位的值(有一个确定的特定值) ,那么很明显没有信息被传输。然而,如果每个比特都是独立的,那么它们传输的信息量可能是0或1,1000个信息子(通常被称为比特)。在这两个极端之间,信息可以被量化如下。如果 x 是所有消息}}的集合,并且是某个 mathbb x / math 中某个数学 x 的概率,那么熵,,的定义如下:
:<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
<math> H(X) = \mathbb{E}_{X} [I(x)] = -\sum_{x \in \mathbb{X}} p(x) \log p(x).</math>
Math h (x) mathbb { x }[ i (x)]-sum { x in mathbb { x } p (x) log p (x) . / math
(Here, {{math|''I''(''x'')}} is the [[self-information]], which is the entropy contribution of an individual message, and {{math|𝔼<sub>''X''</sub>}} is the [[expected value]].) A property of entropy is that it is maximized when all the messages in the message space are equiprobable {{math|1=''p''(''x'') = 1/''n''}}; i.e., most unpredictable, in which case {{math|1=''H''(''X'') = log ''n''}}.
(Here, is the self-information, which is the entropy contribution of an individual message, and is the expected value.) A property of entropy is that it is maximized when all the messages in the message space are equiprobable ; i.e., most unpredictable, in which case .
(这里是自信息,它是单个信息的熵贡献,也是期望值。)熵的一个特性是,当消息空间中的所有消息都是等概率时,熵就会最大化; 也就是说,在这种情况下,熵是最不可预测的。
The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:
The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2, thus having the shannon (Sh) as unit:
对于具有两个结果的随机变量,熵的特殊情况是二元熵函数,通常以对数为底2,因此以 shannon (Sh)为单位:
:<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>
<math>H_{\mathrm{b}}(p) = - p \log_2 p - (1-p)\log_2 (1-p).</math>
(p)-p log 2 p-(1-p) log 2(1-p) . / math
===Joint entropy===
===Joint entropy===
联合熵
The {{em|[[joint entropy]]}} of two discrete random variables {{math|''X''}} and {{math|''Y''}} is merely the entropy of their pairing: {{math|(''X'', ''Y'')}}. This implies that if {{math|''X''}} and {{math|''Y''}} are [[statistical independence|independent]], then their joint entropy is the sum of their individual entropies.
The of two discrete random variables and is merely the entropy of their pairing: . This implies that if and are independent, then their joint entropy is the sum of their individual entropies.
两个离散随机变量的熵,仅仅是它们配对的熵: 。这意味着如果和是独立的,那么它们的联合熵就是它们各自熵的总和。
For example, if {{math|(''X'', ''Y'')}} represents the position of a chess piece — {{math|''X''}} the row and {{math|''Y''}} the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
For example, if represents the position of a chess piece — the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.
例如,如果代表棋子的位置ー行和列,那么棋子行和列的联合熵就是棋子位置的联合熵。
:<math>H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \,</math>
<math>H(X, Y) = \mathbb{E}_{X,Y} [-\log p(x,y)] = - \sum_{x, y} p(x, y) \log p(x, y) \,</math>
数学 h (x,y) mathbb { x,y }[- log p (x,y)]- sum { x,y } p (x,y) log p (x,y) ,/ math
Despite similar notation, joint entropy should not be confused with {{em|[[cross entropy]]}}.
Despite similar notation, joint entropy should not be confused with .
尽管符号相似,联合熵不应与。
===Conditional entropy (equivocation)===
===Conditional entropy (equivocation)===
条件熵(含糊其辞)
The {{em|[[conditional entropy]]}} or ''conditional uncertainty'' of {{math|''X''}} given random variable {{math|''Y''}} (also called the ''equivocation'' of {{math|''X''}} about {{math|''Y''}}) is the average conditional entropy over {{math|''Y''}}:<ref name=Ash>{{cite book | title = Information Theory | author = Robert B. Ash | publisher = Dover Publications, Inc. | origyear = 1965| year = 1990 | isbn = 0-486-66521-6 | url = https://books.google.com/books?id=ngZhvUfF0UIC&pg=PA16&dq=intitle:information+intitle:theory+inauthor:ash+conditional+uncertainty}}</ref>
The or conditional uncertainty of given random variable (also called the equivocation of about ) is the average conditional entropy over :
给定随机变量的或条件不确定性(也称为约的模糊性)是除以以下条件熵的平均值:
:<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
<math> H(X|Y) = \mathbb E_Y [H(X|y)] = -\sum_{y \in Y} p(y) \sum_{x \in X} p(x|y) \log p(x|y) = -\sum_{x,y} p(x,y) \log p(x|y).</math>
数学 h (x | y) mathbb e y [ h (x | y)]- sum { y } p (y) sum { x } p (x | y) log p (x | y)- sum { x,y } p (x,y) log p (x | y)。 数学
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:
Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:
因为熵可以取决于一个随机变量或者一个确定的随机变量,所以应该注意不要混淆条件熵的这两个定义,前者更为常用。这种条件熵的一个基本属性是:
: <math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
<math> H(X|Y) = H(X,Y) - H(Y) .\,</math>
数学 h (x | y) h (x,y)-h (y)。 ,/ 数学
===Mutual information (transinformation)===
===Mutual information (transinformation)===
互信息(交换信息)
''[[Mutual information]]'' measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of {{math|''X''}} relative to {{math|''Y''}} is given by:
Mutual information measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by:
互信息测量的是通过观察另一个随机变量可以获得的信息量。在通信中,它可以用来最大限度地在发送和接收信号之间共享信息量,这一点非常重要。相对于的相互信息是通过以下方式给出的:
:<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
<math>I(X;Y) = \mathbb{E}_{X,Y} [SI(x,y)] = \sum_{x,y} p(x,y) \log \frac{p(x,y)}{p(x)\, p(y)}</math>
数学 i (x; y) mathbb { x,y }[ SI (x,y)] sum { x,y } p (x,y) log frac { p (x,y)} ,p (y)} / math
where {{math|SI}} (''S''pecific mutual ''I''nformation) is the [[pointwise mutual information]].
where (Specific mutual Information) is the pointwise mutual information.
其中(特定的相互信息)是点间互信息。
A basic property of the mutual information is that
A basic property of the mutual information is that
互信息的一个基本属性是
: <math>I(X;Y) = H(X) - H(X|Y).\,</math>
<math>I(X;Y) = H(X) - H(X|Y).\,</math>
数学 i (x; y) h (x)-h (x | y)。 ,/ 数学
That is, knowing ''Y'', we can save an average of {{math|''I''(''X''; ''Y'')}} bits in encoding ''X'' compared to not knowing ''Y''.
That is, knowing Y, we can save an average of bits in encoding X compared to not knowing Y.
也就是说,了解了 y,我们就可以在编码 x 时比不知道 y 平均节省位。
Mutual information is [[symmetric function|symmetric]]:
Mutual information is symmetric:
互信息是对称的:
: <math>I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\,</math>
<math>I(X;Y) = I(Y;X) = H(X) + H(Y) - H(X,Y).\,</math>
数学 i (x; y) i (y; x) h (x) + h (y)-h (x,y)。 ,/ math
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the [[posterior probability|posterior probability distribution]] of ''X'' given the value of ''Y'' and the [[prior probability|prior distribution]] on ''X'':
Mutual information can be expressed as the average Kullback–Leibler divergence (information gain) between the posterior probability distribution of X given the value of Y and the prior distribution on X:
在给定 y 值和 x 的先验分布的情况下,互信息可以表示为 x 的后验概率之间的平均 Kullback-Leibler 散度(信息增益) :
: <math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
<math>I(X;Y) = \mathbb E_{p(y)} [D_{\mathrm{KL}}( p(X|Y=y) \| p(X) )].</math>
Math i (x; y) mathbb e { p (y)}[ d { mathrm { KL }(p (x | y) | p (x))] . / math
In other words, this is a measure of how much, on the average, the probability distribution on ''X'' will change if we are given the value of ''Y''. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:
换句话说,这是一种衡量方法,如果我们给出 y 的值,平均来说,x 上的概率分布将会改变多少。这经常被重新计算为从边际分布乘积到实际联合分布的背离:
: <math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
<math>I(X; Y) = D_{\mathrm{KL}}(p(X,Y) \| p(X)p(Y)).</math>
Math i (x; y) d { mathrum { KL }(p (x,y) | p (x) p (y)) . / math
Mutual information is closely related to the [[likelihood-ratio test|log-likelihood ratio test]] in the context of contingency tables and the [[multinomial distribution]] and to [[Pearson's chi-squared test|Pearson's χ<sup>2</sup> test]]: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.
Mutual information is closely related to the log-likelihood ratio test in the context of contingency tables and the multinomial distribution and to Pearson's χ<sup>2</sup> test: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.
互信息与列联表和多项分布上的对数似然比检验以及 Pearson 的 sup 2 / sup 检验密切相关: 互信息可以被看作是一个评估一对变量之间独立性的统计量,并且具有良好的指定的渐近分布。
===Kullback–Leibler divergence (information gain)===
===Kullback–Leibler divergence (information gain)===
Kullback-Leibler 分歧(信息增益)
The ''[[Kullback–Leibler divergence]]'' (or ''information divergence'', ''information gain'', or ''relative entropy'') is a way of comparing two distributions: a "true" [[probability distribution]] ''p(X)'', and an arbitrary probability distribution ''q(X)''. If we compress data in a manner that assumes ''q(X)'' is the distribution underlying some data, when, in reality, ''p(X)'' is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined
The Kullback–Leibler divergence (or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined
Kullback-Leibler 分歧(或相对熵,信息增益,或相对熵)是一种比较两种分布的方法: 一种是“真实的”概率分布 p (x) ,另一种是任意的概率分布 q (x)。如果我们假设 q (x)是某些数据的分布,而实际上 p (x)是正确的分布,那么 Kullback-Leibler 散度就是每个数据压缩所需的平均附加比特数。它就是这样定义的
:<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
<math>D_{\mathrm{KL}}(p(X) \| q(X)) = \sum_{x \in X} -p(x) \log {q(x)} \, - \, \sum_{x \in X} -p(x) \log {p(x)} = \sum_{x \in X} p(x) \log \frac{p(x)}{q(x)}.</math>
数学 d { mathrm { KL }(p (x) | q (x)) sum { x }-p (x) log { q (x)} ,-,sum { x }-p (x) log { p (x) log { sum { p (x)}{ q (x)}。 数学
Although it is sometimes used as a 'distance metric', KL divergence is not a true [[Metric (mathematics)|metric]] since it is not symmetric and does not satisfy the [[triangle inequality]] (making it a semi-quasimetric).
Although it is sometimes used as a 'distance metric', KL divergence is not a true metric since it is not symmetric and does not satisfy the triangle inequality (making it a semi-quasimetric).
尽管有时它被用作一个距离度量,KL 散度不是一个真正的度量,因为它不是对称的,也不满足三角不等式(使它成为一个半准度量)。
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number ''X'' is about to be drawn randomly from a discrete set with probability distribution ''p(x)''. If Alice knows the true distribution ''p(x)'', while Bob believes (has a [[prior probability|prior]]) that the distribution is ''q(x)'', then Bob will be more [[Information content|surprised]] than Alice, on average, upon seeing the value of ''X''. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the ''log'' is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.
Another interpretation of the KL divergence is the "unnecessary surprise" introduced by a prior from the truth: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it is expected to make him.
对 KL 分歧的另一种解释是“不必要的惊讶” ,由真理之前引入: 假设一个数字 x 是从一个离散的集合随机抽取的概率分布 p (x)。如果爱丽丝知道真正的分布 p (x) ,而鲍勃认为(有先验)分布是 q (x) ,那么平均而言,鲍勃在看到 x 的值时会比爱丽丝更惊讶。Kl 发散度是 Bob (主观)惊喜的(客观)预期值减去 Alice 的惊喜值,如果日志位于以2为基数,则以位为单位进行测量。通过这种方式,鲍勃的先验“错误”的程度可以量化为预期会让他“不必要地惊讶”的程度。
===Other quantities===
===Other quantities===
其他数量
Other important information theoretic quantities include [[Rényi entropy]] (a generalization of entropy), [[differential entropy]] (a generalization of quantities of information to continuous distributions), and the [[conditional mutual information]].
Other important information theoretic quantities include Rényi entropy (a generalization of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information.
其他重要的信息理论量包括 r nyi 熵(熵的推广) ,微分熵(信息量的推广到连续分布) ,以及条件互信息。
==Coding theory==
==Coding theory==
编码理论
{{Main|Coding theory}}
[[File:CDSCRATCHES.jpg|thumb|right|A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using [[error detection and correction]].]]
A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using [[error detection and correction.]]
在可读光盘的表面上显示划痕的图片。 音乐和数据光盘使用纠错编码进行编码,因此即使它们有轻微的划痕,也可以通过[纠错]进行读取
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into [[data compression|source coding]] theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
编码理论是信息论最重要和最直接的应用之一。它可以细分为信源编码理论和信道编码理论。利用数据的统计描述,信息理论量化描述数据所需的位数,这是数据源的熵。
* Data compression (source coding): There are two formulations for the compression problem:
*[[lossless data compression]]: the data must be reconstructed exactly;
*[[lossy data compression]]: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ''[[rate–distortion theory]]''.
* Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.
This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the [[broadcast channel]]) or intermediary "helpers" (the [[relay channel]]), or more general [[computer network|networks]], compression followed by transmission may no longer be optimal. [[Network information theory]] refers to these multi-agent communication models.
This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.
这种将编码理论划分为压缩和传输的做法得到了信息传输定理或信源信道分离定理的证明,这些定理证明在许多情况下使用比特作为信息的通用货币是正确的。然而,这些定理只适用于一个发送用户希望与一个接收用户通信的情况。在多于一个发射器(多重接达频道)、多于一个接收器(广播频道)或中介”辅助器”(中继频道)或多个一般网络的情况下,先压缩后传输可能不再是最佳选择。网络信息理论就是指这些多 agent 通信模型。
===Source theory===
===Source theory===
来源理论
Any process that generates successive messages can be considered a {{em|[[Communication source|source]]}} of information. A memoryless source is one in which each message is an [[Independent identically distributed random variables|independent identically distributed random variable]], whereas the properties of [[ergodic theory|ergodicity]] and [[stationary process|stationarity]] impose less restrictive constraints. All such sources are [[stochastic process|stochastic]]. These terms are well studied in their own right outside information theory.
Any process that generates successive messages can be considered a of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.
任何生成连续消息的进程都可以被认为是一种信息。无记忆信源是指每个信息是一个独立的同分布随机变量,而遍历性和平稳性的性质对信源的限制较少。所有这些源都是随机的。这些术语本身在信息论之外已经得到了很好的研究。
====Rate====<!-- This section is linked from [[Channel capacity]] -->
====Rate====<!-- This section is linked from Channel capacity -->
速率! ——这部分与频道容量相关——
Information ''[[Entropy rate|rate]]'' is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is
信息率是每个符号的平均熵。对于无记忆的源,这仅仅是每个符号的熵,而对于静止的随机过程来说,它是
:<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
<math>r = \lim_{n \to \infty} H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);</math>
(x n | x { n-1} ,x { n-2} ,x { n-3} ,ldots) ; / math
that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the ''average rate'' is
that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is
也就是说,一个符号的条件熵给出了所有之前产生的符号。对于一个不一定是平稳的过程的更一般的情况,平均速率是
:<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
<math>r = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n);</math>
(x1,x2, dots xn) ; / math
that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.<ref>{{cite book | title = Digital Compression for Multimedia: Principles and Standards | author = Jerry D. Gibson | publisher = Morgan Kaufmann | year = 1998 | url = https://books.google.com/books?id=aqQ2Ry6spu0C&pg=PA56&dq=entropy-rate+conditional#PPA57,M1 | isbn = 1-55860-369-7 }}</ref>
that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.
也就是说,每个符号的联合熵的极限。对于静止源,这两个表达式给出了相同的结果。
It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of {{em|source coding}}.
It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of .
在信息论中,谈论一种语言的“速率”或“熵”是很常见的。这是适当的,例如,当信息来源是英语散文。信息来源的速度与它的冗余度以及它可以被压缩的程度有关。
===Channel capacity===
===Channel capacity===
信道容量
{{Main|Channel capacity}}
Communications over a channel—such as an [[ethernet]] cable—is the primary motivation of information theory. However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.
Communications over a channel—such as an ethernet cable—is the primary motivation of information theory. However, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality.
通过信道(如以太网电缆)进行通信是信息理论的主要动机。然而,这样的通道往往不能产生准确的重建信号; 噪声,静默时间,和其他形式的信号损坏往往降低质量。
Consider the communications process over a discrete channel. A simple model of the process is shown below:
Consider the communications process over a discrete channel. A simple model of the process is shown below:
考虑一个离散通道上的通信过程。这个过程的一个简单模型如下:
[[File:Channel model.svg|center|800px|Channel model]]
Channel model
通道模型
Here ''X'' represents the space of messages transmitted, and ''Y'' the space of messages received during a unit time over our channel. Let {{math|''p''(''y''{{pipe}}''x'')}} be the [[conditional probability]] distribution function of ''Y'' given ''X''. We will consider {{math|''p''(''y''{{pipe}}''x'')}} to be an inherent fixed property of our communications channel (representing the nature of the ''[[Signal noise|noise]]'' of our channel). Then the joint distribution of ''X'' and ''Y'' is completely determined by our channel and by our choice of {{math|''f''(''x'')}}, the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the ''[[Signal (electrical engineering)|signal]]'', we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the {{em|channel capacity}} and is given by:
Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let x)}} be the conditional probability distribution function of Y given X. We will consider x)}} to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal, we can communicate over the channel. The appropriate measure for this is the mutual information, and this maximum mutual information is called the and is given by:
这里 x 表示传输的消息空间,y 表示单位时间内通过我们的信道接收的消息空间。设 x)}是给定 x 的 y 的条件概率分布函数。我们将考虑 x)}是我们通信信道的固有固定属性(表示我们信道噪声的性质)。那么,x 和 y 的联合分布完全取决于我们的渠道,以及我们选择通过渠道发送的信息边缘分布。在这些约束条件下,我们希望最大化信息速率,或者说信号速率,我们可以通过信道进行通信。对于这个问题的适当措施是互信息,这个最大的互信息被称为互信息,并由以下人员给出:
:<math> C = \max_{f} I(X;Y).\! </math>
<math> C = \max_{f} I(X;Y).\! </math>
数学 c max { f } i (x; y) . !数学
This capacity has the following property related to communicating at information rate ''R'' (where ''R'' is usually bits per symbol). For any information rate ''R < C'' and coding error ε > 0, for large enough ''N'', there exists a code of length ''N'' and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate ''R > C'', it is impossible to transmit with arbitrarily small block error.
This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.
此容量具有以下与以信息速率 r (其中 r 通常为每个符号位)进行通信有关的属性。对于任意信息速率 rc 和编码错误0,对于足够大的 n,存在一个长度 n 和速率≥ r 的码和一个译码算法,使得块错误的最大概率≤ ,即总是可以在任意小的块错误下传输。此外,对于任何速率的 rc,它不可能传输任意小的块误差。
''[[Channel code|Channel coding]]'' is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
Channel coding is concerned with finding such nearly optimal codes that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.
信道编码涉及到寻找这种接近最优的编码,它可以用于在噪声信道上以接近信道容量的速率传输数据,而且编码错误很小。
====Capacity of particular channel models====
====Capacity of particular channel models====
特定渠道模型的容量
* A continuous-time analog communications channel subject to [[Gaussian noise]] — see [[Shannon–Hartley theorem]].
* A [[binary symmetric channel]] (BSC) with crossover probability ''p'' is a binary input, binary output channel that flips the input bit with probability ''p''. The BSC has a capacity of {{math|1 − ''H''<sub>b</sub>(''p'')}} bits per channel use, where {{math|''H''<sub>b</sub>}} is the binary entropy function to the base-2 logarithm:
::[[File:Binary symmetric channel.svg]]
File:Binary symmetric channel.svg
文件: Binary symmetric channel.svg
* A [[binary erasure channel]] (BEC) with erasure probability ''p'' is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is {{nowrap|1 − ''p''}} bits per channel use.
::[[File:Binary erasure channel.svg]]
File:Binary erasure channel.svg
文件: Binary erasure channel.svg
==Applications to other fields==
==Applications to other fields==
其他领域的应用
===Intelligence uses and secrecy applications===
===Intelligence uses and secrecy applications===
情报使用和保密应用
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the [[Ban (unit)|ban]], was used in the [[Ultra]] project, breaking the German [[Enigma machine]] code and hastening the [[Victory in Europe Day|end of World War II in Europe]]. Shannon himself defined an important concept now called the [[unicity distance]]. Based on the redundancy of the [[plaintext]], it attempts to give a minimum amount of [[ciphertext]] necessary to ensure unique decipherability.
Information theoretic concepts apply to cryptography and cryptanalysis. Turing's information unit, the ban, was used in the Ultra project, breaking the German Enigma machine code and hastening the end of World War II in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.
信息论概念应用于密码学和密码分析。图灵的信息单元,即禁令,被用于 Ultra 项目,破解了德国的恩尼格玛密码,加速了二战在欧洲的结束。香农自己定义了一个重要的概念,现在称为单一性距离。基于明文的冗余性,它试图给出最小数量的密文,以确保独特的破译能力。
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A [[brute force attack]] can break systems based on [[public-key cryptography|asymmetric key algorithms]] or on most commonly used methods of [[symmetric-key algorithm|symmetric key algorithms]] (sometimes called secret key algorithms), such as [[block cipher]]s. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on asymmetric key algorithms or on most commonly used methods of symmetric key algorithms (sometimes called secret key algorithms), such as block ciphers. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.
信息理论使我们相信,保守秘密比它最初可能出现时要困难得多。穷举法可以基于非对称密钥算法或最常用的对称密钥算法(有时称为秘密密钥算法) ,如分组密码破坏系统。目前所有这些方法的安全性都来自于这样一个假设,即任何已知的攻击都不能在实际的时间内破坏它们。
[[Information theoretic security]] refers to methods such as the [[one-time pad]] that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the [[key (cryptography)|key]]) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the [[Venona project]] was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.
Information theoretic security refers to methods such as the one-time pad that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information between the plaintext and ciphertext (conditioned on the key) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse of key material.
信息理论安全指的是诸如一次性垫子这样的方法,它不容易受到这种蛮力攻击。在这种情况下,明文和密文(以密钥为条件)之间的正向有条件的互信息可以确保适当的传输,而明文和密文之间的无条件互信息仍然为零,从而导致绝对安全的通信。换句话说,窃听者不能通过获得密文的知识而不是密钥来改进他或她对明文的猜测。但是,如同任何其他密码系统一样,必须小心谨慎,甚至正确应用信息——理论上安全的方法; Venona 项目能够破解苏联的一次性密封垫,因为它们不适当地重复使用关键材料。
===Pseudorandom number generation===
===Pseudorandom number generation===
伪随机数生成
[[Pseudorandom number generator]]s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed [[cryptographically secure pseudorandom number generator]]s, but even they require [[random seed]]s external to the software to work as intended. These can be obtained via [[Extractor (mathematics)|extractors]], if done carefully. The measure of sufficient randomness in extractors is [[min-entropy]], a value related to Shannon entropy through [[Rényi entropy]]; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
Pseudorandom number generators are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generators, but even they require random seeds external to the software to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy, a value related to Shannon entropy through Rényi entropy; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.
伪随机数发生器广泛应用于计算机语言库和应用程序中。它们几乎普遍不适合加密使用,因为它们不回避现代计算机设备和软件的确定性本质。一类改进的随机数生成器被称为密码安全的伪随机数生成器,但即使它们也需要软件外部的随机种子才能正常工作。这些可以通过提取器获得,如果仔细做。提取器中充分随机性的度量是最小熵,这是一个通过 r nyi 熵与 Shannon 熵相关的值,r nyi 熵也用于评估密码系统中的随机性。虽然相关,这些措施之间的区别意味着具有高香农熵的随机变量不一定令人满意的使用在提取器和密码学使用。
===Seismic exploration===
===Seismic exploration===
地震勘探
One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and [[digital signal processing]] offer a major improvement of resolution and image clarity over previous analog methods.<ref>{{cite journal|doi=10.1002/smj.4250020202 | volume=2 | issue=2 | title=The corporation and innovation | year=1981 | journal=Strategic Management Journal | pages=97–118 | last1 = Haggerty | first1 = Patrick E.}}</ref>
One early commercial application of information theory was in the field of seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing offer a major improvement of resolution and image clarity over previous analog methods.
信息论的一个早期商业应用是在地震石油勘探领域。这一领域的工作使从所需的地震信号中剔除和分离不需要的噪声成为可能。信息理论和数字信号处理提供了一个重大改善的分辨率和图像清晰度比以前的模拟方法。
===Semiotics===
===Semiotics===
符号学
[[Semiotics|Semioticians]] [[:nl:Doede Nauta|Doede Nauta]] and [[Winfried Nöth]] both considered [[Charles Sanders Peirce]] as having created a theory of information in his works on semiotics.<ref name="Nauta 1972">{{cite book |ref=harv |last1=Nauta |first1=Doede |title=The Meaning of Information |date=1972 |publisher=Mouton |location=The Hague |isbn=9789027919960}}</ref>{{rp|171}}<ref name="Nöth 2012">{{cite journal |ref=harv |last1=Nöth |first1=Winfried |title=Charles S. Peirce's theory of information: a theory of the growth of symbols and of knowledge |journal=Cybernetics and Human Knowing |date=January 2012 |volume=19 |issue=1–2 |pages=137–161 |url=https://edisciplinas.usp.br/mod/resource/view.php?id=2311849}}</ref>{{rp|137}} Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."<ref name="Nauta 1972"/>{{rp|91}}
Semioticians Doede Nauta and Winfried Nöth both considered Charles Sanders Peirce as having created a theory of information in his works on semiotics. Nauta defined semiotic information theory as the study of "the internal processes of coding, filtering, and information processing."
符号学家 Doede Nauta 和 Winfried n 都认为查尔斯·桑德斯·皮尔士在他的符号学著作中创造了信息理论。诺塔将符号信息理论定义为研究“编码、过滤和信息处理的内部过程”
Concepts from information theory such as redundancy and code control have been used by semioticians such as [[Umberto Eco]] and [[:it:Ferruccio Rossi-Landi|Ferruccio Rossi-Landi]] to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.<ref>Nöth, Winfried (1981). "[https://kobra.uni-kassel.de/bitstream/handle/123456789/2014122246977/semi_2004_002.pdf?sequence=1&isAllowed=y Semiotics of ideology]". ''Semiotica'', Issue 148.</ref>
Concepts from information theory such as redundancy and code control have been used by semioticians such as Umberto Eco and Ferruccio Rossi-Landi to explain ideology as a form of message transmission whereby a dominant social class emits its message by using signs that exhibit a high degree of redundancy such that only one message is decoded among a selection of competing ones.
来自信息论的概念,如冗余和代码控制,已经被符号学家如 Umberto Eco 和 Ferruccio Rossi-Landi 用来解释意识形态作为一种信息传递的形式,主导社会阶层通过使用高度冗余的标志来发出信息,这样在一系列相互竞争的标志中只有一个信息被解码。
===Miscellaneous applications===
===Miscellaneous applications===
杂项申请
Information theory also has applications in [[Gambling and information theory]], [[black hole information paradox|black holes]], and [[bioinformatics]].
Information theory also has applications in Gambling and information theory, black holes, and bioinformatics.
信息理论在赌博和信息理论、黑洞和生物信息学中也有应用。
==See also==
==See also==
参见
{{Portal|Mathematics}}
* [[Algorithmic probability]]
* [[Bayesian inference]]
* [[Communication theory]]
* [[Constructor theory]] - a generalization of information theory that includes quantum information
* [[Inductive probability]]
* [[Info-metrics]]
* [[Minimum message length]]
* [[Minimum description length]]
* [[List of important publications in theoretical computer science#Information theory|List of important publications]]
* [[Philosophy of information]]
===Applications===
===Applications===
申请
{{div col|colwidth=20em}}
* [[Active networking]]
* [[Cryptanalysis]]
* [[Cryptography]]
* [[Cybernetics]]
* [[Entropy in thermodynamics and information theory]]
* [[Gambling]]
* [[Intelligence (information gathering)]]
* [[reflection seismology|Seismic exploration]]
{{div col end}}
===History===
===History===
历史
{{div col|colwidth=20em}}
* [[Ralph Hartley|Hartley, R.V.L.]]
* [[History of information theory]]
* [[Claude Elwood Shannon|Shannon, C.E.]]
* [[Timeline of information theory]]
* [[Hubert Yockey|Yockey, H.P.]]
{{div col end}}
===Theory===
===Theory===
理论
{{div col|colwidth=20em}}
* [[Coding theory]]
* [[Detection theory]]
* [[Estimation theory]]
* [[Fisher information]]
* [[Information algebra]]
* [[Information asymmetry]]
* [[Information field theory]]
* [[Information geometry]]
* [[Information theory and measure theory]]
* [[Kolmogorov complexity]]
* [[List of unsolved problems in information theory]]
* [[Logic of information]]
* [[Network coding]]
* [[Philosophy of information]]
* [[Quantum information science]]
* [[Source coding]]
{{div col end}}
===Concepts===
===Concepts===
概念
{{div col|colwidth=20em}}
* [[Ban (unit)]]
* [[Channel capacity]]
* [[Communication channel]]
* [[Communication source]]
* [[Conditional entropy]]
* [[Covert channel]]
* [[Data compression]]
* Decoder
* [[Differential entropy]]
* [[Fungible information]]
* [[Information fluctuation complexity]]
* [[Information entropy]]
* [[Joint entropy]]
* [[Kullback–Leibler divergence]]
* [[Mutual information]]
* [[Pointwise mutual information]] (PMI)
* [[Receiver (information theory)]]
* [[Redundancy (information theory)|Redundancy]]
* [[Rényi entropy]]
* [[Self-information]]
* [[Unicity distance]]
* [[Variety (cybernetics)|Variety]]
* [[Hamming distance]]
{{div col end}}
==References==
==References==
参考资料
{{Reflist}}
===The classic work===
===The classic work===
经典之作
{{refbegin}}
* [[Claude Elwood Shannon|Shannon, C.E.]] (1948), "[[A Mathematical Theory of Communication]]", ''Bell System Technical Journal'', 27, pp. 379–423 & 623–656, July & October, 1948. [http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf PDF.] <br />[http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html Notes and other formats.]
* R.V.L. Hartley, [http://www.dotrose.com/etext/90_Miscellaneous/transmission_of_information_1928b.pdf "Transmission of Information"], ''Bell System Technical Journal'', July 1928
* [[Andrey Kolmogorov]] (1968), "[https://www.tandfonline.com/doi/pdf/10.1080/00207166808803030 Three approaches to the quantitative definition of information]" in International Journal of Computer Mathematics.
{{refend}}
===Other journal articles===
===Other journal articles===
其他期刊文章
{{refbegin}}
* J. L. Kelly, Jr., [http://betbubbles.com/wp-content/uploads/2017/07/kelly.pdf Betbubbles.com]{{Dead link|date=January 2020 |bot=InternetArchiveBot |fix-attempted=yes }}, "A New Interpretation of Information Rate" ''Bell System Technical Journal'', Vol. 35, July 1956, pp. 917–26.
* R. Landauer, [http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=615478 IEEE.org], "Information is Physical" ''Proc. Workshop on Physics and Computation PhysComp'92'' (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4.
* {{cite journal | last1 = Landauer | first1 = R. | year = 1961 | title = Irreversibility and Heat Generation in the Computing Process | url = http://www.research.ibm.com/journal/rd/441/landauerii.pdf | journal = IBM J. Res. Dev. | volume = 5 | issue = 3| pages = 183–191 | doi = 10.1147/rd.53.0183 }}
* {{cite arXiv |last=Timme |first=Nicholas|last2=Alford |first2=Wesley|last3=Flecker |first3=Benjamin|last4=Beggs |first4=John M.|date=2012 |title=Multivariate information measures: an experimentalist's perspective |eprint=1111.6857|class=cs.IT}}
{{refend}}
===Textbooks on information theory===
===Textbooks on information theory===
信息论教材
{{refbegin}}
* Arndt, C. ''Information Measures, Information and its Description in Science and Engineering'' (Springer Series: Signals and Communication Technology), 2004, {{isbn|978-3-540-40855-0}}
* Ash, RB. ''Information Theory''. New York: Interscience, 1965. {{isbn|0-470-03445-9}}. New York: Dover 1990. {{isbn|0-486-66521-6}}
* [[Gallager, R]]. ''Information Theory and Reliable Communication.'' New York: John Wiley and Sons, 1968. {{isbn|0-471-29048-3}}
* Goldman, S. ''Information Theory''. New York: Prentice Hall, 1953. New York: Dover 1968 {{isbn|0-486-62209-6}}, 2005 {{isbn|0-486-44271-3}}
* {{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of information theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}
* [[Csiszar, I]], Korner, J. ''Information Theory: Coding Theorems for Discrete Memoryless Systems'' Akademiai Kiado: 2nd edition, 1997. {{isbn|963-05-7440-3}}
* [[David J. C. MacKay|MacKay, David J. C.]]. ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'' Cambridge: Cambridge University Press, 2003. {{isbn|0-521-64298-1}}
* Mansuripur, M. ''Introduction to Information Theory''. New York: Prentice Hall, 1987. {{isbn|0-13-484668-0}}
* [[Robert McEliece|McEliece, R]]. ''The Theory of Information and Coding". Cambridge, 2002. {{isbn|978-0521831857}}
*Pierce, JR. "An introduction to information theory: symbols, signals and noise". Dover (2nd Edition). 1961 (reprinted by Dover 1980).
* [[Reza, F]]. ''An Introduction to Information Theory''. New York: McGraw-Hill 1961. New York: Dover 1994. {{isbn|0-486-68210-2}}
* {{cite book |last1=Shannon |first1=Claude |author-link1=Claude Shannon |last2=Weaver |first2=Warren |author-link2=Warren Weaver |date=1949 |title=The Mathematical Theory of Communication |url=http://monoskop.org/images/b/be/Shannon_Claude_E_Weaver_Warren_The_Mathematical_Theory_of_Communication_1963.pdf |location=[[Urbana, Illinois]] |publisher=[[University of Illinois Press]] |lccn=49-11922 |isbn=0-252-72548-4}}
* Stone, JV. Chapter 1 of book [http://jim-stone.staff.shef.ac.uk/BookInfoTheory/InfoTheoryBookMain.html "Information Theory: A Tutorial Introduction"], University of Sheffield, England, 2014. {{isbn|978-0956372857}}.
* Yeung, RW. ''[http://iest2.ie.cuhk.edu.hk/~whyeung/book/ A First Course in Information Theory]'' Kluwer Academic/Plenum Publishers, 2002. {{isbn|0-306-46791-7}}.
* Yeung, RW. ''[http://iest2.ie.cuhk.edu.hk/~whyeung/book2/ Information Theory and Network Coding]'' Springer 2008, 2002. {{isbn|978-0-387-79233-0}}
{{refend}}
===Other books===
===Other books===
其他书籍
{{refbegin}}
* Leon Brillouin, ''Science and Information Theory'', Mineola, N.Y.: Dover, [1956, 1962] 2004. {{isbn|0-486-43918-6}}
* [[James Gleick]], ''[[The Information: A History, a Theory, a Flood]]'', New York: Pantheon, 2011. {{isbn|978-0-375-42372-7}}
* A. I. Khinchin, ''Mathematical Foundations of Information Theory'', New York: Dover, 1957. {{isbn|0-486-60434-9}}
* H. S. Leff and A. F. Rex, Editors, ''Maxwell's Demon: Entropy, Information, Computing'', Princeton University Press, Princeton, New Jersey (1990). {{isbn|0-691-08727-X}}
* [[Robert K. Logan]]. ''What is Information? - Propagating Organization in the Biosphere, the Symbolosphere, the Technosphere and the Econosphere'', Toronto: DEMO Publishing.
* Tom Siegfried, ''The Bit and the Pendulum'', Wiley, 2000. {{isbn|0-471-32174-5}}
* Charles Seife, ''[[Decoding the Universe]]'', Viking, 2006. {{isbn|0-670-03441-X}}
* Jeremy Campbell, ''[[Grammatical Man]]'', Touchstone/Simon & Schuster, 1982, {{isbn|0-671-44062-4}}
* Henri Theil, ''Economics and Information Theory'', Rand McNally & Company - Chicago, 1967.
* Escolano, Suau, Bonev, ''[https://www.springer.com/computer/image+processing/book/978-1-84882-296-2 Information Theory in Computer Vision and Pattern Recognition]'', Springer, 2009. {{isbn|978-1-84882-296-2}}
* Vlatko Vedral, ''Decoding Reality: The Universe as Quantum Information'', Oxford University Press 2010. {{ISBN|0-19-923769-7}}
{{refend}}
===MOOC on information theory===
===MOOC on information theory===
信息理论大型开放式课程
* Raymond W. Yeung, "[http://www.inc.cuhk.edu.hk/InformationTheory/index.html Information Theory]" ([[The Chinese University of Hong Kong]])
==External links==
==External links==
外部链接
{{Wikiquote}}
{{Library resources box}}
* {{SpringerEOM |title=Information |id=p/i051040}}
* Lambert F. L. (1999), "[http://jchemed.chem.wisc.edu/Journal/Issues/1999/Oct/abs1385.html Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms - Examples of Entropy Increase? Nonsense!]", ''Journal of Chemical Education''
* [http://www.itsoc.org/ IEEE Information Theory Society] and [https://www.itsoc.org/resources/surveys ITSOC Monographs, Surveys, and Reviews]
{{Cybernetics}}
{{Compression methods}}
{{Areas of mathematics}}
{{Computer science}}
{{Authority control}}
{{DEFAULTSORT:Information Theory}}
[[Category:Information theory| ]]
[[Category:Computer science]]
Category:Computer science
类别: 计算机科学
[[Category:Cybernetics]]
Category:Cybernetics
类别: 控制论
[[Category:Formal sciences]]
Category:Formal sciences
类别: 正规科学
[[Category:Information Age]]
Category:Information Age
类别: 信息时代
<noinclude>
<small>This page was moved from [[wikipedia:en:Information theory]]. Its edit history can be viewed at [[信息论/edithistory]]</small></noinclude>
[[Category:待整理页面]]