简体   繁体   English

Bison / Yacc语法中的无意连接

[英]Unintentional concatenation in Bison/Yacc grammar

I am experimenting with lex and yacc and have run into a strange issue, but I think it would be best to show you my code before detailing the issue. 我正在尝试使用lex和yacc并遇到一个奇怪的问题,但我认为最好在详细说明问题之前向我们展示我的代码。 This is my lexer: 这是我的词法分析员:

%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}

%%

[a-zA-Z]+ {
  yylval.strV = yytext;
  return ID;
}

[0-9]+      {
  yylval.intV = atoi(yytext);
  return INTEGER;
}

[\n] { return *yytext; }

[ \t]        ;

. yyerror("invalid character");

%%

int yywrap(void) {
  return 1;
}

This is my parser: 这是我的解析器:

%{
#include <stdio.h>

int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}

%union {
  int intV;
  char *strV;
}

%token INTEGER ID

%%

program: program statement EOF { prompt(); }
       | program EOF { prompt(); }
       | { prompt(); }
       ;

args: /* empty */
    | args ID { printf(":%s ", $<strV>2); }
    ;

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

EOF: '\n'

%%

void yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
}

void prompt() {
  printf("> ");
}

int main(void) {
  yyparse();
  return 0;
}

A very simple language, consisting of no more than strings and integer and a basic REPL. 一种非常简单的语言,由不超过字符串和整数以及基本REPL组成。 Now, you'll note in the parser that args are output with a leading colon, the intention being that, when combined with the first pattern of the rule of the statement the interaction with the REPL would look something like this: 现在,您将在解析器中注意到args是使用前导冒号输出的,目的是,当与语句规则的第一个模式结合使用时,与REPL的交互将如下所示:

> aaa aa a
:aa :a aaa>

However, the interaction is this: 但是,互动是这样的:

> aaa aa a
:aa :a aaa aa aa
>

Why does the token ID in the following rule 为什么令牌ID在以下规则中

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

have the semantic value of the total input string, newline included? 具有总输入字符串的语义值,包括换行符? How can my grammar be reworked so that the interaction I intended? 我的语法如何重新编写,以便我打算进行交互?

You have to preserve token strings as they are read if you want them to remain valid. 如果您希望令牌字符串保持有效,则必须在读取它们时保留它们。 I modified the statement rule to read: 我将statement规则修改为:

statement: ID { printf("<%s> ", $<strV>1); } args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

Then, with your input, I get the output: 然后,根据您的输入,我得到输出:

> aaa aa a
<aaa> :aa :a aaa aa a
>

Note that at the time the initial ID is read, the token is exactly what you expected. 请注意,在读取初始ID时,令牌正是您所期望的。 But, because you did not preserve the token, the string has been modified by the time you get back to printing it after the args have been parsed. 但是,因为您没有保留令牌,所以在解析args之后返回打印时,字符串已被修改。

I think there is an associativity conflict between the args and statement productions. 我认为args和语句产生之间存在关联性冲突。 This is borne out by the (partial) output from the bison -v parser.output file: 这可以通过bison -v parser.output文件的(部分)输出得到证实:

Nonterminals, with rules where they appear

$accept (6)
    on left: 0
program (7)
    on left: 1 2 3, on right: 0 1 2
statement (8)
    on left: 4 5, on right: 1
args (9)
    on left: 6 7, on right: 4 7
EOF (10)
    on left: 8, on right: 1 2

Indeed, I'm having a hard time trying to figure out what your grammar is trying to accept. 实际上,我很难弄清楚你的语法试图接受什么。 As a side note, I'd probably move your EOF production into the lexer as an EOL token; 作为旁注,我可能会将您的EOF作品作为EOL令牌移动到词法分析器中; this will make resynchronizing on parse errors easier. 这将使解析错误的重新同步更容易。

Better explanation of your intent would be helpful. 更好地解释你的意图会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM