分析句法的结构

句子的构成

对于使用比较短句子的文本,使用n-gram token(n个词构成的部分)的词频等统计信息就可以描述文本的特征。但对于较长句子,我们需要分析句法的结构和词的词性来进行更精确的分析。例如,下面是一张词性对照表:

Symbol	 Meaning                Example
S	 sentence	        the man walked
NP	 noun phrase	        a dog
VP	 verb phrase	        saw a park
PP	 prepositional phrase	with a telescope
Det	 determiner	        the
N	 noun	                dog
V	 verb	                walked
P	 preposition	        in

语法分析及其可视化

为了把句子拆开,展示句子的结构,我们可以使用nltk包如下

    import nltk
    
    grammar1 = nltk.parse_cfg("""
    S -> NP VP
    VP -> V NP | V NP PP
    PP -> P NP
    V -> "saw" | "ate" | "walked"
    NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
    Det -> "a" | "an" | "the" | "my"
    N -> "man" | "dog" | "cat" | "telescope" | "park"
    P -> "in" | "on" | "by" | "with"
    """)
    
    sent = "the man saw Bob with the telescope".split()
    
    rd_parser = nltk.RecursiveDescentParser(grammar1)
    
    for tree in rd_parser.nbest_parse(sent):
        print tree
    
    tree1 = nltk.Tree('S', [nltk.Tree('NP', [nltk.Tree('Det', ['the']), 
            nltk.Tree('N', ['man'])]), nltk.Tree('VP', [nltk.Tree('V', ['saw']), 
            nltk.Tree('NP', ['Bob']), nltk.Tree('PP', [nltk.Tree('P', ['with']), 
            nltk.Tree('NP', [nltk.Tree('Det', ['the']), nltk.Tree('N', ['telescope'])])])])])
    
    tree1.draw()

最后可以得到这张图