简体   繁体   中英

Python NLTK: parse sentence using conjoint structure, getting into infinite recursion

I am asked to create two different parse tree for the following sentence:

foo while bar and baz

Based on these two constructions:

S-> S while S
S-> S and S

The two different trees I have are the following:

Tree A)

     S
   / | \
  P  U  S
  |    /|\
  W   P U P  
      |   |
      W   W

Here is the code for A:

import nltk

groucho_grammar = nltk.CFG.fromstring ("""
S -> P U S | P U P
P -> W
U -> 'while' | 'and'
W -> 'foo'|'bar'|'baz'
""")

print(groucho_grammar)

sentence = "foo while bar and baz"

rd_parser = nltk.RecursiveDescentParser(groucho_grammar)
for tree in rd_parser.parse(sentence.split()):
    print(tree)

And the result for A:

(S (P (W foo)) (U while) (S (P (W bar)) (U and) (P (W baz))))

Tree B)

       S
     / | \
    S  U  P
  / | \    \
 P  U  P    W
 |     |
 W     W

Now for part B, I just modified the grammar to the following:

groucho_grammar = nltk.CFG.fromstring ("""
S -> S U P | P U P
P -> W
U -> 'while' | 'and'
W -> 'foo'|'bar'|'baz'
""")

But I am getting infinite recursion error:

    if isinstance(index, (int, slice)):
RuntimeError: maximum recursion depth exceeded in __instancecheck__

Any help would be appreciated.

Thanks.

Your problem is this rule: S -> SUP | PUP S -> SUP | PUP

By allowing S to begin with an instance of S, you allow this infinite recursion:

S -> S U P
S -> (S U P) U P
S -> ((S U P) U P) U P
S -> (((S U P) U P) U P) U P

This is called left recursion, and it is caused by a symbol expanding to itself, in this case S expanding to S.

From the NLTK book, chapter 8 :

Recursive descent parsing has three key shortcomings. First, left-recursive productions like NP -> NP PP send it into an infinite loop.

A solution

Luckily, you can simply change the parser you use to one that does not share the left-recursive Achilles heel. Simple change this:

rd_parser = nltk.RecursiveDescentParser(groucho_grammar)

to this:

rd_parser = nltk.parse.chart.BottomUpLeftCornerChartParser(groucho_grammar)

This way you make use of the left-recursive-resistant BottomUpLeftCornerChartParser

Further reading

The left-recursive problem is well-known in automata theory. There are ways to make your grammar non-recursive, as explained in these links:

  1. http://www.cs.engr.uky.edu/~lewis/essays/compilers/rec-des.html
  2. http://www.umsl.edu/~kjs9rc/CS4890/presentation.pdf
  3. http://research.microsoft.com/pubs/68869/naacl2k-proc-rev.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM