简体   繁体   中英

Stuck on translating a cfg to dcg

I'm trying to teach myself prolog and implementing an interpreter for a simple arithmetic cfg:

<expression> --> number
<expression> --> ( <expression> )
<expression> --> <expression> + <expression>
<expression> --> <expression> - <expression>
<expression> --> <expression> * <expression>
<expression> --> <expression> / <expression> 

So far, I've written this in swi-prolog which hits a number of bugs described below;

expression(N) --> number(Cs), { number_codes(N, Cs) }.
expression(N) --> "(", expression(N), ")".
expression(N) --> expression(X), "+", expression(Y), { N is X + Y }.
expression(N) --> expression(X), "-", expression(Y), { N is X - Y }.

number([D|Ds]) --> digit(D), number(Ds).
number([D])    --> digit(D).

digit(D) --> [D], { code_type(D, digit) }.

Testing with

phrase(expression(X), "12+4"). 

gives X = 16 which is good. Also

phrase(expression(X), "(12+4)"). 

works and phrase(expression(X), "12+4+5"). is ok.

But trying

phrase(expression(X), "12-4"). 

causes "ERROR: Out of local stack" unless I comment out the "+" rule. And while I can add more than two numbers, brackets don't work recursively (ie "(1+2)+3" hangs).

I'm sure the solution is simple, but I haven't been able to figure it out from the online tutorials I've found.

Everything you did is correct in principle. And you're right: the answer is simple.

But.

Left recursion is fatal in definite-clause grammars; the symptom is precisely the behavior you are seeing.

If you set a spy point on expression and use the trace facility, you can watch your stack grow and grow and grow while the parser makes no progress at all.

gtrace.
spy(expression).
phrase(expression(N),"12-4").

If you think carefully about the Prolog execution model, you can see what is happening.

  1. We try to parse "12-4" as an expression.

    Our call stack is contains this call to expression from step 1, which I will write expression (1).

  2. We succeed in parsing "12" as an expression, by the first clause for "expression", and we record a choice point in case we need to backtrack later. In fact we need to backtrack immediately, because the parent request involving phrase says we want to parse the entire string, and we haven't: we still have "-4" to go. So we fail and go back to the choice point. We have shown that the first clause of "expression" doesn't succeed so we retry against the second clause.

    The call stack: expression (1).

  3. We try to parse "12-4" using the second clause for "expression", but fail immediately (the initial character is not "("). So we fail and retry against the third clause.

    Call stack: expression (1).

  4. The third clause asks us to parse an expression off the beginning of the input and then find a "+" and another expression. So we must try now to parse the beginning of the input as an expression.

    Call stack: expression (4) expression (1).

  5. We try to parse the beginning of "12-4" as an expression, and succeed with "12", just as in step 1. We record a choice point in case we need to backtrack later.

    Call stack: expression (4) expression (1).

  6. We now resume the attempt begun in step 4 to parse "12-4" as an expression against clause 3 of "expression". We've done the first bit; now we must try to parse "-4" as the remainder of the right-hand side of clause 3 of "expression", namely "+", expression(Y) . But "-" is not "+", so we fail immediately, and go back to the most recently recorded choice point, the one recorded in step 5. The next thing is to try to find a different way of parsing the beginning of the input as an expression. We resume this search with the second clause of "expression".

    Call stack: expression (4) expression (1).

  7. Once again the second clause fails. So we continue with the third clause of "expression". This asks us to look for an expression at the beginning of the input (as part of figuring out whether our current two calls to "expression" can succeed or will fail). So we call "expression" again.

    Call stack: expression (7) expression (4) expression (1).

You can see that each time we add a call to expression to the stack, we are going to succeed, look for a plus, fail, and try again, eventually reaching the third clause, at which point we will push another call on the stack and try again.

Short answer: left recursion is fatal in DCGs.

It's also fatal in recursive-descent parsers, and the solution is much the same: don't recur to the left.

A non-left-recursive version of your grammar would be:

expression(N) --> term(N).
expression(N) --> term(X), "+", expression(Y), { N is X + Y }.
expression(N) --> term(X), "-", expression(Y), { N is X - Y }.
term(N) --> number(Cs), { number_codes(N, Cs) }.
term(N) --> "(", expression(N), ")".

However, this makes "-" right associative, and requires the initial term to be reparsed repeatedly in many cases, so a common approach in code intended for production is to do something less like the BNF you started with and more like the following EBNF version:

expression = term {("+"|"-") term}
term = number | "(" expression ")".

The way I learned to write it (long enough ago that I no longer remember whom to credit for it) is something like this (I found it ugly at first, but it grows on you):

expression(N) --> term(X), add_op_sequence(X,N).
add_op_sequence(LHS0, Result) -->
    "+", term(Y),
    {LHS1 is LHS0 + Y},
    add_op_sequence(LHS1,Result).
add_op_sequence(LHS0, Result) -->
    "-", term(Y),
    {LHS1 is LHS0 - Y},
    add_op_sequence(LHS1,Result).
add_op_sequence(N,N) --> [].

term(N) --> number(Cs), { number_codes(N, Cs) }.
term(N) --> "(", expression(N), ")".

The value accumulated so far is passed down in the left-hand argument of add_op_sequence and eventually (when the sequence ends with the empty production) passed back up as a result.

The parsing strategy known as 'left-corner parsing' is a way of dealing with this problem; books on the use of Prolog in natural-language processing will almost invariably discuss it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM