分析句法的结构
跳到导航
跳到搜索
句子的构成
对于使用比较短句子的文本,使用n-gram token(n个词构成的部分)的词频等统计信息就可以描述文本的特征。但对于较长句子,我们需要分析句法的结构和词的词性来进行更精确的分析。例如,下面是一张词性对照表:
Symbol Meaning Example S sentence the man walked NP noun phrase a dog VP verb phrase saw a park PP prepositional phrase with a telescope Det determiner the N noun dog V verb walked P preposition in
语法分析及其可视化
为了把句子拆开,展示句子的结构,我们可以使用nltk包如下
import nltk
grammar1 = nltk.parse_cfg("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"
""")
sent = "the man saw Bob with the telescope".split()
rd_parser = nltk.RecursiveDescentParser(grammar1)
for tree in rd_parser.nbest_parse(sent):
print tree
tree1 = nltk.Tree('S', [nltk.Tree('NP', [nltk.Tree('Det', ['the']),
nltk.Tree('N', ['man'])]), nltk.Tree('VP', [nltk.Tree('V', ['saw']),
nltk.Tree('NP', ['Bob']), nltk.Tree('PP', [nltk.Tree('P', ['with']),
nltk.Tree('NP', [nltk.Tree('Det', ['the']), nltk.Tree('N', ['telescope'])])])])])
tree1.draw()
最后可以得到这张图