例如:在布朗英文语料库中,单词''' the''' 是最常出现的单词,占所有单词的近7%。根据[[齐普夫定律 Zipf's law]],排在第二位的 of 在单词中所占的比例略高于3.5%(共出现36,411次),其次为单词and(出现28,852次),仅前135个词汇就占了Brown语料库的一半。<ref name="asasa">Fagan, Ramazan, David E. A [https://pattern.swarma.org/paper?id=a5099ae4-6f3a-11ea-ae37-0242ac1a0005 "An introduction to textual econometrics", "For example, in the Brown Corpus, consisting of over one million words, half of the word volume consists of repeated uses of only 135 words."].Handbook of Empirical Economics and Finance.139.(133--153)</ref> | 例如:在布朗英文语料库中,单词''' the''' 是最常出现的单词,占所有单词的近7%。根据[[齐普夫定律 Zipf's law]],排在第二位的 of 在单词中所占的比例略高于3.5%(共出现36,411次),其次为单词and(出现28,852次),仅前135个词汇就占了Brown语料库的一半。<ref name="asasa">Fagan, Ramazan, David E. A [https://pattern.swarma.org/paper?id=a5099ae4-6f3a-11ea-ae37-0242ac1a0005 "An introduction to textual econometrics", "For example, in the Brown Corpus, consisting of over one million words, half of the word volume consists of repeated uses of only 135 words."].Handbook of Empirical Economics and Finance.139.(133--153)</ref> |