简体   繁体   中英

Second Noun in Python NLTK Grammar RegExParser Not Recognized

I have created the following code to recognize a grammar consisting of a verb folowed by one or more determiners and then one or more nouns. The grammar will not recognize a second noun as being in the grammar (example phrase: "monitoring a parking space"):

Testing sentence in grammar:  monitoring a parking space
Grammar Chunk: 
(S (MT monitoring/VBG a/DT parking/NN) (MT space/NN))
False

Here is the code used in Python 3.5.6:

import nltk

def extractMT(sent):
    grammar = r"""
    MT:
        {<VBG|VBZ|VB>?<DT>?<NN|NNS>}
    """
    chunker = nltk.RegexpParser(grammar)

    ne = set()
    chunk = chunker.parse(nltk.pos_tag(nltk.word_tokenize(sent)))
    print("Grammar Chunk: ")
    print(chunk)

    for tree in chunk.subtrees(filter=lambda t: t.label() == 'MT'):
        returnList = []
        for child in tree.leaves():
                returnList.append(child[0])

        ne.add(' '.join(returnList))

    return ne

testSentence1 = "monitoring a parking space"

print ("Testing sentence in grammar:  " + testSentence1)

print ("Is sentence in grammar?:  " + testSentence1 in extractMT(testSentence1))

Like in standard regex to get many elements you need + (which means one or more ) or * (which means zero or more )

 {<VBG|VBZ|VB>?<DT>?<NN|NNS>+}

You can also use {,2} to get 0 , 1 or 2 elements, or {1,2} get 1 or 2 elements, or {2} to get exactly 2 elements

 {<VBG|VBZ|VB>?<DT>?<NN|NNS>{,2}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM