[英]NLTK fcfg grammar parser out of index
我是NLTK的新手。 试图将“给我看电影”转换为简单的SQL SELECT语句“从电影中选择标题”。
我相信这句话是(VP + NP)与VP(V + PRO)和NP(DET + N)。 但是我毫不怀疑我设置的.fcfg语法是不正确的,我在“ anwser = trees”上收到以下索引错误,其中树为空。
如何更正.fcfg?
IndexError:列表索引超出范围
流程以退出代码1完成
解析器
% start S
S[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]
Det[SEM=''] -> 'the'
PRO[SEM=''] -> 'me'
N[SEM='title FROM films'] -> 'movies'
V[SEM='SELECT'] -> 'show'
Python代码
from nltk import load_parser
cp = load_parser('parser3.fcfg')
query = 'show me the movies'
trees = list(cp.parse(query.split()))
print(trees)
answer = trees[0].label()['SEM']
answer = [s for s in answer if s]
q = ' '.join(answer)
print(q)
要调试语法,请从小处着手并制定规则。
from nltk import grammar, parse
from nltk.parse.generate import generate
g = """
VP -> V N
V[SEM='SELECT'] -> 'show'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show movies'.split())
print (list(trees))
[出]:
[Tree(VP[], [Tree(V[SEM='SELECT'], ['show']), Tree(N[SEM='title FROM films'], ['movies'])])]
g = """
VP -> V NP
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
DT[SEM=''] -> 'the'
V[SEM='SELECT'] -> 'show'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show the movies'.split())
print (list(trees))
[出]:
[Tree(VP[], [Tree(V[SEM='SELECT'], ['show']), Tree(NP[SEM=(, title FROM films)], [Tree(DT[SEM=''], ['the']), Tree(N[SEM='title FROM films'], ['movies'])])])]
我们想将句子“给我看电影”解析为
S[ VP[show me] NP[the movie] ]
所以我们必须将TOP更改为S -> VP NP
。
g = """
S -> VP NP
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
[出]:
[Tree(S[], [Tree(VP[SEM=(SELECT, )], [Tree(V[SEM='SELECT'], ['show']), Tree(PRO[SEM=''], ['me'])]), Tree(NP[SEM=(, title FROM films)], [Tree(DT[SEM=''], ['the']), Tree(N[SEM='title FROM films'], ['movies'])])])]
目前我们的TOP规则尚不明确,但是如果我们同时指定左侧(LHS)和右侧(RHS),我们将发现它无效:
g = """
S[SEM=(?vp + WHERE + ?np)] -> VP[SEM=?vp] NP[SEM=?np]
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
即使我们删除了WHERE
语义,我们也看到它没有被解析:
g = """
S[SEM=(?vp + ?np)] -> VP[SEM=?vp] NP[SEM=?np]
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
[出]:
[]
但是,如果我们仅指定RHS,它将解析:
g = """
S -> VP[SEM=?vp] NP[SEM=?np]
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
[出]:
[Tree(S[], [Tree(VP[SEM=(SELECT, )], [Tree(V[SEM='SELECT'], ['show']), Tree(PRO[SEM=''], ['me'])]), Tree(NP[SEM=(, title FROM films)], [Tree(DT[SEM=''], ['the']), Tree(N[SEM='title FROM films'], ['movies'])])])]
当我们仅指定LHS时,其工作原理相同:
g = """
S[SEM=(?vp + WHERE + ?np)] -> VP NP
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
[出]:
[Tree(S[SEM=(?vp+WHERE+?np)], [Tree(VP[SEM=(SELECT, )], [Tree(V[SEM='SELECT'], ['show']), Tree(PRO[SEM=''], ['me'])]), Tree(NP[SEM=(, title FROM films)], [Tree(DT[SEM=''], ['the']), Tree(N[SEM='title FROM films'], ['movies'])])])]
我们可以像对NP和VP一样指定非终结点,但是什么使TOP(即S -> VP NP
)与众不同呢?
如果我们破解语法并仅仅给出一元分支该怎么办?
g = """
S -> SP
SP[SEM=(?vp + WHERE + ?np)] -> VP[SEM=?vp] NP[SEM=?np]
VP[SEM=(?v + ?pro)] -> V[SEM=?v] PRO[SEM=?pro]
NP[SEM=(?det + ?n)] -> DT[SEM=?det] N[SEM=?n]
V[SEM='SELECT'] -> 'show'
PRO[SEM=''] -> 'me'
DT[SEM=''] -> 'the'
N[SEM='title FROM films'] -> 'movies'
"""
my_grammar = grammar.FeatureGrammar.fromstring(g)
parser = parse.FeatureEarleyChartParser(my_grammar)
trees = parser.parse('show me the movies'.split())
print (list(trees))
[出]:
[Tree(S[], [Tree(SP[SEM=(SELECT, , WHERE, , title FROM films)], [Tree(VP[SEM=(SELECT, )], [Tree(V[SEM='SELECT'], ['show']), Tree(PRO[SEM=''], ['me'])]), Tree(NP[SEM=(, title FROM films)], [Tree(DT[SEM=''], ['the']), Tree(N[SEM='title FROM films'], ['movies'])])])])]
有人应该向NLTK github存储库提出这个问题。 看起来这可能是保护TOP规则或错误的特殊功能=)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.