简体   繁体   中英

Grammar nltk for list in Python

I have to create a grammar nltk for a list in python . I have this grammar for a text:

grammar1 = nltk.CFG.fromstring("""
    S -> NP VP
    VP -> V NP | V NP PP
    PP -> P NP
    V -> "saw" | "ate" | "walked"
    NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
    Det -> "a" | "an" | "the" | "my"
    N -> "man" | "dog" | "cat" | "telescope" | "kitchen"
    P -> "in" | "on" | "by" | "with"
    """)

sent = "the cat ate a telescope in the kitchen".split()
rd_parser = nltk.RecursiveDescentParser(grammar1)

for tree in rd_parser.parse(sent):
    print(tree)

Now, how can I do the same for a list ? I need to test legal and illegal list with a basic grammar. I didn't find any intel about a nltk and lists and I don't really understand how can I do that...

Notice that the following code line already creates a list (of strings).

sent = "the cat ate a telescope in the kitchen".split()

You have also created a recursive descent parser for your grammar using the following line. Note that you only need to do this once.

rd_parser = nltk.RecursiveDescentParser(grammar1)

Now, if you want to test a different list of tokens, simply do something like this:

L = ["John", "walked", "the", "dog"]
result = rd_parser.parse(L)

You have a parser that can be applied to lists of tokens. You have a collection of test materials in different formats. Quoting from your comment : "empty list, list with one token, list with several tokens, list with numbers, tuple, and dictionnary."

The parser can handle "sequences" of strings, which in your case means a list or tuple whose elements are strings (and each string is a word). The parser cannot handle anything else; if your code has to deal with other types, write python code to check their type before the parser sees them.

You'll be interested in the built-in functions isinstance() (preferred) and type() . Eg,

if (isinstance(sent, (tuple, list)) and all(isinstance(w, str) for w in sent)):
    # A tuple or list of strings; try to parse it.
    trees = rd_parser.parse(sent)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM