简体   繁体   English

坚持将cfg转换为dcg

[英]Stuck on translating a cfg to dcg

I'm trying to teach myself prolog and implementing an interpreter for a simple arithmetic cfg: 我正在尝试自学序言,并为简单的算术cfg实现解释器:

<expression> --> number
<expression> --> ( <expression> )
<expression> --> <expression> + <expression>
<expression> --> <expression> - <expression>
<expression> --> <expression> * <expression>
<expression> --> <expression> / <expression> 

So far, I've written this in swi-prolog which hits a number of bugs described below; 到目前为止,我已经在swi-prolog中编写了此代码,但遇到了以下所述的许多错误。

expression(N) --> number(Cs), { number_codes(N, Cs) }.
expression(N) --> "(", expression(N), ")".
expression(N) --> expression(X), "+", expression(Y), { N is X + Y }.
expression(N) --> expression(X), "-", expression(Y), { N is X - Y }.

number([D|Ds]) --> digit(D), number(Ds).
number([D])    --> digit(D).

digit(D) --> [D], { code_type(D, digit) }.

Testing with 用测试

phrase(expression(X), "12+4"). 

gives X = 16 which is good. 给出X = 16,这很好。 Also

phrase(expression(X), "(12+4)"). 

works and phrase(expression(X), "12+4+5"). 作品和词组(表达式(X),“ 12 + 4 + 5”)。 is ok. 还可以

But trying 但是尝试

phrase(expression(X), "12-4"). 

causes "ERROR: Out of local stack" unless I comment out the "+" rule. 导致“错误:超出本地堆栈”,除非我注释掉“ +”规则。 And while I can add more than two numbers, brackets don't work recursively (ie "(1+2)+3" hangs). 虽然我可以添加两个以上的数字,但括号不能递归工作(即“(1 + 2)+3”挂起)。

I'm sure the solution is simple, but I haven't been able to figure it out from the online tutorials I've found. 我敢肯定解决方案很简单,但是我无法从找到的在线教程中找出解决方案。

Everything you did is correct in principle. 您所做的一切原则上都是正确的。 And you're right: the answer is simple. 您说对了:答案很简单。

But. 但。

Left recursion is fatal in definite-clause grammars; 左递归在定句语法中是致命的。 the symptom is precisely the behavior you are seeing. 症状恰好是您所看到的行为。

If you set a spy point on expression and use the trace facility, you can watch your stack grow and grow and grow while the parser makes no progress at all. 如果在expression上设置监视点并使用跟踪工具,则可以观察到堆栈不断增长,而解析器根本没有任何进展。

gtrace.
spy(expression).
phrase(expression(N),"12-4").

If you think carefully about the Prolog execution model, you can see what is happening. 如果仔细考虑Prolog执行模型,您可以看到正在发生的事情。

  1. We try to parse "12-4" as an expression. 我们尝试将“ 12-4”解析为一个表达式。

    Our call stack is contains this call to expression from step 1, which I will write expression (1). 我们的调用堆栈包含对步骤1中对expression调用,我将编写expression (1)。

  2. We succeed in parsing "12" as an expression, by the first clause for "expression", and we record a choice point in case we need to backtrack later. 通过“表达式”的第一个子句,我们成功地将“ 12”解析为表达式,并记录了一个选择点,以备日后需要回溯时使用。 In fact we need to backtrack immediately, because the parent request involving phrase says we want to parse the entire string, and we haven't: we still have "-4" to go. 实际上,我们需要立即回溯,因为涉及phrase的父请求说我们想解析整个字符串,但我们没有:我们还有“ -4”。 So we fail and go back to the choice point. 因此,我们失败了,回到选择点。 We have shown that the first clause of "expression" doesn't succeed so we retry against the second clause. 我们已经表明“表达式”的第一个子句不会成功,因此我们尝试对第二个子句进行重试。

    The call stack: expression (1). 调用堆栈: expression (1)。

  3. We try to parse "12-4" using the second clause for "expression", but fail immediately (the initial character is not "("). So we fail and retry against the third clause. 我们尝试使用第二个子句为“ expression”解析“ 12-4”,但是立即失败(初始字符不是“(”)。因此,我们失败了,然后针对第三个子句重试。

    Call stack: expression (1). 调用堆栈: expression (1)。

  4. The third clause asks us to parse an expression off the beginning of the input and then find a "+" and another expression. 第三个子句要求我们从输入的开头解析一个表达式,然后找到“ +”和另一个表达式。 So we must try now to parse the beginning of the input as an expression. 因此,我们现在必须尝试将输入的开头解析为表达式。

    Call stack: expression (4) expression (1). 调用堆栈: expression (4) expression (1)。

  5. We try to parse the beginning of "12-4" as an expression, and succeed with "12", just as in step 1. We record a choice point in case we need to backtrack later. 我们尝试将“ 12-4”的开头解析为一个表达式,然后以“ 12”开头,就像在步骤1中一样。我们记录一个选择点,以备日后需要回溯时使用。

    Call stack: expression (4) expression (1). 调用堆栈: expression (4) expression (1)。

  6. We now resume the attempt begun in step 4 to parse "12-4" as an expression against clause 3 of "expression". 现在,我们恢复从步骤4开始的尝试,以将“ 12-4”解析为针对“表达式”的第3条的表达式。 We've done the first bit; 我们已经完成了第一步。 now we must try to parse "-4" as the remainder of the right-hand side of clause 3 of "expression", namely "+", expression(Y) . 现在我们必须尝试将“ -4”解析为“ expression”子句3右侧的其余部分,即"+", expression(Y) But "-" is not "+", so we fail immediately, and go back to the most recently recorded choice point, the one recorded in step 5. The next thing is to try to find a different way of parsing the beginning of the input as an expression. 但是“-”不是“ +”,因此我们立即失败,并返回到最近记录的选择点,即在步骤5中记录的选择点。接下来的事情是尝试找到一种不同的方法来解析起始位置。输入为表达式。 We resume this search with the second clause of "expression". 我们使用“表达式”的第二个子句恢复搜索。

    Call stack: expression (4) expression (1). 调用堆栈: expression (4) expression (1)。

  7. Once again the second clause fails. 第二个子句再次失败。 So we continue with the third clause of "expression". 因此,我们继续“表达式”的第三个子句。 This asks us to look for an expression at the beginning of the input (as part of figuring out whether our current two calls to "expression" can succeed or will fail). 这要求我们在输入的开头查找表达式(作为弄清楚我们当前对“ expression”的两个调用是成功还是失败的一部分)。 So we call "expression" again. 因此,我们再次称呼“表达”。

    Call stack: expression (7) expression (4) expression (1). 调用堆栈: expression (7) expression (4) expression (1)。

You can see that each time we add a call to expression to the stack, we are going to succeed, look for a plus, fail, and try again, eventually reaching the third clause, at which point we will push another call on the stack and try again. 您可以看到,每次我们向堆栈添加对expression的调用时,我们都会成功,寻找加号,失败,然后再试一次,最终到达第三个子句,这时我们将把另一个调用推入堆栈然后再试一次。

Short answer: left recursion is fatal in DCGs. 简短的答案:左递归在DCG中是致命的。

It's also fatal in recursive-descent parsers, and the solution is much the same: don't recur to the left. 在递归下降解析器中它也是致命的,解决方案大体相同:不要向左递归。

A non-left-recursive version of your grammar would be: 语法的非左递归版本为:

expression(N) --> term(N).
expression(N) --> term(X), "+", expression(Y), { N is X + Y }.
expression(N) --> term(X), "-", expression(Y), { N is X - Y }.
term(N) --> number(Cs), { number_codes(N, Cs) }.
term(N) --> "(", expression(N), ")".

However, this makes "-" right associative, and requires the initial term to be reparsed repeatedly in many cases, so a common approach in code intended for production is to do something less like the BNF you started with and more like the following EBNF version: 但是,这使“-”具有正确的关联性,并且在许多情况下都需要重复重新定义初始术语,因此,用于生产的代码中的常见方法是做的事情不像您开始时使用的BNF,而应像以下EBNF版本那样:

expression = term {("+"|"-") term}
term = number | "(" expression ")".

The way I learned to write it (long enough ago that I no longer remember whom to credit for it) is something like this (I found it ugly at first, but it grows on you): 我学会写它的方式(很久以前,我不再记得要归功于谁了)是这样的(我一开始发现它很丑,但它会在您身上成长):

expression(N) --> term(X), add_op_sequence(X,N).
add_op_sequence(LHS0, Result) -->
    "+", term(Y),
    {LHS1 is LHS0 + Y},
    add_op_sequence(LHS1,Result).
add_op_sequence(LHS0, Result) -->
    "-", term(Y),
    {LHS1 is LHS0 - Y},
    add_op_sequence(LHS1,Result).
add_op_sequence(N,N) --> [].

term(N) --> number(Cs), { number_codes(N, Cs) }.
term(N) --> "(", expression(N), ")".

The value accumulated so far is passed down in the left-hand argument of add_op_sequence and eventually (when the sequence ends with the empty production) passed back up as a result. 到目前为止累积的值在add_op_sequence的左侧参数中向下传递,并最终(当序列以空生产结束时)最终向上传递。

The parsing strategy known as 'left-corner parsing' is a way of dealing with this problem; 称为“左角解析”的解析策略是解决此问题的一种方法。 books on the use of Prolog in natural-language processing will almost invariably discuss it. 关于在自然语言处理中使用Prolog的书籍几乎都会讨论它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM