简体   繁体   中英

NLTK CFG Grammar with multiple words

NLTK 3.0:

With a CFG configuration like below (Nonterminal team has 4 values with 1 value that has 2 words (sri lankan).

When I generate the list of possible generations, I can see the two worded coming up in the result. But when I try to parse an input sentence with that two worded grammar, it does not parse.

import nltk
from nltk.parse import generate
from nltk.grammar import Nonterminal


cfg = nltk.CFG.fromstring("""
root -> who_player has the most runs
who_player -> who
who_player -> which player
who_player -> which team player
who -> 'who'
which -> 'which'
player -> 'player'
team -> 'indian' | 'australian' | 'england' | 'sri lankan'
has -> 'has'
the -> 'the'
this -> 'this'
most -> 'most'
runs -> 'runs'
""")

print(list((n,sent) for n, sent in enumerate(generate.generate(cfg, n=100, start=Nonterminal('root')), 1)))

# Above generate generates ['which', 'sri lankan', 'player', 'has', 'the', 'most', 'runs']
# But the same sentence is not parsable by ChartParser.

result1 = nltk.ChartParser(cfg).parse('which england player has the most runs'.split())
print(list(result1))
result2 = nltk.ChartParser(cfg).parse('which sri lankan player has the most runs'.split()) # Does not work.
print(list(result2))

How to make multi worded configuration work with ChartParser.

Pipes separates the nodes in the chart and spaces separates individual words from a multiword expression. The multiword expression would create a single tree with two items in the list.

team -> 'indian' | 'australian' | 'england' | 'sri' 'lankan'

[out]:

[Tree('root', [Tree('who_player', [Tree('which', ['which']), Tree('team', ['sri', 'lankan']), Tree('player', ['player'])]), Tree('has', ['has']), Tree('the', ['the']), Tree('most', ['most']), Tree('runs', ['runs'])])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM