简体   繁体   English

语法为Lex / Yacc

[英]Grammar to Lex/Yacc

I have been tasked with a project that involves me taking a Grammar (in BNF form) and creating a lexical scanner (using lex) and a parser (using bison). 我的任务是让我参加一个语法(BNF形式)并创建一个词法扫描器(使用lex)和一个解析器(使用bison)的项目。 I've never worked with any of these programs and I think a good reference would be to see how these items are created from a grammar. 我从来没有使用过这些程序,我认为一个很好的参考将是看如何从语法中创建这些项目。 I am looking for a grammar and it's associated .l and .ypp files, preferably in C++. 我正在寻找一种语法,并且它与.l和.ypp文件相关联,最好在C ++中。 I've been able to find sample files or sample grammars, but not both of them. 我已经能够找到示例文件或示例语法,但不能同时找到它们。 I've spent some time searching and I could not find anything. 我花了一些时间搜索,但找不到任何东西。 I figure I'd post here in hopes that someone has something for me, but I will continue searching in the meantime. 我想在此发布信息,希望有人对我有所帮助,但在此期间我将继续搜索。

I am currently reading Tom Niemann's http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf which seems to be pretty well written and understandable. 我目前正在阅读汤姆·尼曼(Tom Niemann)的http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf ,该书写得不错,可以理解。

Thanks 谢谢

Edit: I am still searching, I am starting to think that what I am looking for does not exist. 编辑:我还在搜索,我开始认为我在寻找什么不存在。 Google usually never fails me! Google通常不会让我失望!

Edit 2: Maybe if I provide some of the grammar, you folks could show me what the appropriate .l and .ypp files would look like. 编辑2:也许,如果我提供一些语法,您可能会告诉我适当的.l和.ypp文件是什么样子。 This is just a snippet of the grammar, I just need a little 'taste' of how this works and I think I can take it from there. 这只是语法的一小段,我只需要对它的工作原理有一点“品味”,我想我可以从那里学到它。

Grammar: 语法:

Program ::= Compound
Statements ::= Compound | Assignment | ...
Assignment ::= Var ASSIGN Expression
Expression ::= Var | Operator Expression Expression | Number
Compound := START Statements END
Number ::= NUMBER

Descriptions: 说明:

Assignment is the equal sign ":="

Var is an identifier that begins with a lower case letter and is followed by lower case letters or digits

START is the "start" keyword

END is the "end keyword

Operator is "+", "-", "*", "/"

Number is decimal digits which could potentially be negative (minus sign in front)

Most of this is fairly straightforward. 其中大多数是相当简单的。 One part, however, is decidedly problematic. 但是,其中一部分无疑是有问题的。 You've defined a number to (potentially) include a leading - , and that's a problem. 您已经定义了一个数字(可能)包括前导- ,这是一个问题。

The problem is pretty simple. 问题很简单。 Given an input like 321-123 , it's essentially impossible for the lexer (which won't normally keep track of current state) to guess at whether that's supposed to be two tokens ( 321 and -123 or three 321 , - , 123 ). 给定类似321-123的输入,词法分析器(通常不会跟踪当前状态)基本上不可能猜测这应该是两个标记( 321-123还是三个321-123 )。 In this case, the - is almost certainly intended to be separate from the 123 , but if the input were 321 + -123 you'd apparently want -123 as a single token instead. 在这种情况下, -几乎可以肯定要与123分开,但是如果输入为321 + -123 ,则显然希望将-123替换为单个标记。

To deal with that, you probably want to change your grammar so the leading - isn't part of the number. 为了解决这个问题,您可能需要更改语法,以使开头-不在数字中。 Instead, you always want to treat the - as an operator, and the number itself is composed solely of the digits. 相反,您始终希望将-视为运算符,并且数字本身仅由数字组成。 Then it's up to the parser to sort out expressions where the - is unary vs. binary. 然后由解析器来整理表达式,其中-是一元还是二进制。

Taking that into account, the lexer file would look something like this: 考虑到这一点,lexer文件将如下所示:

%{
#include "y.tab.h"
%}

%option noyywrap case-insensitive  
%%

:=        { return ASSIGN;   }
start     { return START;    }
end       { return END;      }
[+/*]     { return OPERATOR; }
-         { return MINUS;    }
[0-9]+    { return NUMBER;   }
[a-z][a-z0-9]* { return VAR; }
[ \r\n]   { ; }

%%

void yyerror(char const *s) { fputs(s, stderr); }

The matching yacc file would look something like this: 匹配的yacc文件如下所示:

%token ASSIGN START END OPERATOR MINUS NUMBER VAR
%left '-' '+' '*' '/'
%%

program : compound

statement : compound
            | assignment
            ;

assignment : VAR ASSIGN expression
            ;

statements :
            | statements statement
            ;

expression : VAR
            | expression OPERATOR expression 
            | expression MINUS expression
            | value
            ;

value: NUMBER
     | MINUS NUMBER
     ;          

compound : START statements END

%%

int main() {
    yyparse();
    return 0;
}

Note: I've tested these only extremely minimally--enough to verify input I believe is grammatical, such as: start a:=1 b:=2 end and start a:=1+3*3 b:=a+4 c:=b*3 end is accepted (no error message printed out) and input I believe is un-grammatical, such as: 9:=13 and a=13 do both print out syntax error messages. 注意:我仅对这些进行了极少的测试-足以验证我认为是语法上的输入,例如: start a:=1 b:=2 end and start a:=1+3*3 b:=a+4 c:=b*3 end被接受(未打印出错误消息)并且我认为输入不符合语法要求,例如: 9:=13a=13 可以打印出syntax error消息。 Since this doesn't attempt to do any more with the expressions than recognize those which are or are not grammatical, that's about the best we can do though. 由于这不会尝试对表达式做更多的事情来识别不符合语法的表达式,因此这是我们所能做的最好的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM