# 齐普夫定律

 参量 $\displaystyle{ s \geq 0\, }$ (真实)$\displaystyle{ N \in \{1,2,3\ldots\} }$ (整数) 支持 $\displaystyle{ k \in \{1,2,\ldots,N\} }$ pmf $\displaystyle{ \frac{1/k^s}{H_{N,s}} }$ 其中 HN,s 是第 N个 谐波数 CDF $\displaystyle{ \frac{H_{k,s}}{H_{N,s}} }$ 意思 $\displaystyle{ \frac{H_{N,s-1}}{H_{N,s}} }$ 模式 $\displaystyle{ 1\, }$ 方差 $\displaystyle{ \frac{H_{N,s-2}}{H_{N,s}}-\frac{H^2_{N,s-1}}{H^2_{N,s}} }$ MGF $\displaystyle{ \frac{1}{H_{N,s}}\sum\limits_{n=1}^N \frac{e^{nt}}{n^s} }$ 碳纤维 $\displaystyle{ \frac{1}{H_{N,s}}\sum\limits_{n=1}^N \frac{e^{int}}{n^s} }$ 熵 $\displaystyle{ \frac{s}{H_{N,s}}\sum\limits_{k=1}^N\frac{\ln(k)}{k^s} +\ln(H_{N,s}) }$

## 遵循该定律的现象

• 单词的出现频率：不仅适用于语料全体,也适用于单独的一篇文章
• 网页访问频率
• 城镇人口与城镇等级的关系
• 收入前3%的人的收入
• 地震震级
• 固体破碎时的碎片大小

## 理论回顾

n：所考察元素的数量
k：他们所代表的等级
s：分布的指数值

$\displaystyle{ fksN=\frac{1/k^s}{\sum\limits_{n=1}^N (1/n^s)} }$

## 相关定律

$\displaystyle{ fkNqs=\frac{[\text{constant}]}{(k+q)^s}.\, }$

$\displaystyle{ n }$ 本福德定律: $\displaystyle{ P(n) = }$
$\displaystyle{ \log_{10}(n+1)-\log_{10}(n) }$
$\displaystyle{ \frac{\log(P(n)/P(n-1))}{\log(n/(n-1))} }$
1 0.30103000
2 0.17609126 −0.7735840
3 0.12493874 −0.8463832
4 0.09691001 −0.8830605
5 0.07918125 −0.9054412
6 0.06694679 −0.9205788
7 0.05799195 −0.9315169
8 0.05115252 −0.9397966
9 0.04575749 −0.9462848

## 参考文献

1. Fagan, Ramazan, David E. A "An introduction to textual econometrics", "For example, in the Brown Corpus, consisting of over one million words, half of the word volume consists of repeated uses of only 135 words.".Handbook of Empirical Economics and Finance.139.(133--153)
2. David M. W. Powers (1998) Applications and Explanations of Zipf’s Law.
3. [1] Christopher D. Manning, Hinrich Schütze Foundations of Statistical Natural Language Processing, MIT Press (1999), p. 24
4. Auerbach F. (1913) Das Gesetz der Bevölkerungskonzentration. Petermann’s Geographische Mitteilungen 59, 74–76
5. David M. W. Powers (1998) Applications and Explanations of Zipf’s Law.
6. Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 661–703. doi:10.1137/070710111
7. Adamic, Lada A. (2000) "Zipf, Power-laws, and Pareto - a ranking tutorial".(2007)
8. Moreno-Sánchez, I, Font-Clos, F, A (2016) "Large-Scale Analysis of Zipf's Law in English Texts".PLOS One, arXiv:1509.04486. doi:10.1371/journal.pone.0147073. PMC 4723055. PMID 26800025..11.
9. ZIPF'S LAW (PDF (2006) INVESTIGATING ESPERANTO'S STATISTICAL PROPORTIONS RELATIVE TO OTHER LANGUAGES USING NEURAL NETWORKS, ), Archived (PDF) from the original on 5 March 2016.Bill Manaris; Luca Pellicoro; George Pothering; Harland Hodges (13 February, Artificial Intelligence and Applications.(102--108)
10. Léon Brillouin, La science et la théorie de l'information, 1959, réédité en 1988, traduction anglaise rééditée en 2004
11. Li, Wentian (1992) "Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution".IEEE Transactions on Information Theory, CiteSeerX 10.1.1.164.8422. doi:10.1109/18.38.(1842--1845)
12. Neumann, Peter G."Statistical metalinguistics and Zipf/Pareto/Mandelbrot", SRI International Computer Science Laboratory, accessed and archived 29 May 2011.
13. Sole, Ramon Ferrer i Cancho (2003) "Least effort and the origins of scaling in human language".Proceedings of the National Academy of Sciences of the United States of America, PMC 298679. PMID 12540826. Archived from the original on 2011-12-01..100, 10.(788--791)
14. Lin, Bian, Chunhua (2014) "Scaling laws in human speech, decreasing emergence of new words and a generalized model".arXiv:1412.4846 [cs.CL].
15. Vitanov, Nikolay, K., Ausloos, Chunhua (2015) "Test of two hypotheses explaining the size of populations in a system of cities".Journal of Applied Statistics.42, 1506, 10, 1047744.(2686--2693)
16. N. L. Johnson; S. Kotz; A. W. Kemp (1992). Univariate Discrete Distributions (second ed.). New York: John Wiley & Sons, Inc.. ISBN 978-0-471-54897-3. , p. 466.
17. Johan Gerard van der Galien (2003-11-08). "Factorial randomness: the Laws of Benford and Zipf with respect to the first digit distribution of the factor sequence from the natural numbers". Archived from the original on 2007-03-05. Retrieved 8 July 2016.
18. Ali Eftekhari (2006) Fractal geometry of texts. Journal of Quantitative Linguistic 13(2-3): 177–193.
19. L. Pietronero, E. Tosatti, V. Tosatti, A. Vespignani (2001) Explaining the uneven distribution of numbers in nature: The laws of Benford and Zipf. Physica A 293: 297–304.
20. Mohammadi, Mehdi (2016) "Parallel Document Identification using Zipf's Law" (PDF), Archived (PDF) from the original on.Proceedings of the Ninth Workshop on Building and Using Comparable Corpora. LREC 2016.03.(21--25)
21. Gabaix, Xavier (August 1999). "Zipf's Law for Cities: An Explanation" (PDF). Quarterly Journal of Economics. 114 (3): 739–67. CiteSeerX 10.1.1.180.4097. doi:10.1162/003355399556133. ISSN 0033-5533