添加1,746字节
、 2020年10月14日 (三) 21:48
==句子的构成==
对于使用比较短句子的文本,使用n-gram token(n个词构成的部分)的词频等统计信息就可以描述文本的特征。但对于较长句子,我们需要分析句法的结构和词的词性来进行更精确的分析。例如,下面是一张词性对照表:
Symbol Meaning Example
S sentence the man walked
NP noun phrase a dog
VP verb phrase saw a park
PP prepositional phrase with a telescope
Det determiner the
N noun dog
V verb walked
P preposition in
==语法分析及其可视化==
为了把句子拆开,展示句子的结构,我们可以使用nltk包如下
<syntaxhighlight lang="python">
import nltk
grammar1 = nltk.parse_cfg("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"
""")
sent = "the man saw Bob with the telescope".split()
rd_parser = nltk.RecursiveDescentParser(grammar1)
for tree in rd_parser.nbest_parse(sent):
print tree
tree1 = nltk.Tree('S', [nltk.Tree('NP', [nltk.Tree('Det', ['the']),
nltk.Tree('N', ['man'])]), nltk.Tree('VP', [nltk.Tree('V', ['saw']),
nltk.Tree('NP', ['Bob']), nltk.Tree('PP', [nltk.Tree('P', ['with']),
nltk.Tree('NP', [nltk.Tree('Det', ['the']), nltk.Tree('N', ['telescope'])])])])])
tree1.draw()
</syntaxhighlight>
最后可以得到这张图
[[File:grammar_tree_1.png|500px]]
[[Category:旧词条迁移]]