简体   繁体   English

使用Yacc和Lex进行词法和语法分析

[英]Lexical & Grammar Analysis using Yacc and Lex

I am fairly new to Yacc and Lex programming but I am training myself with a analyser for C programs. 我对Yacc和Lex编程还很陌生,但是我正在使用C语言分析仪来训练自己。

However, I am facing a small issue that I didn't manage to solve. 但是,我面临着一个小问题,我没有解决。

When there is a declaration for example like int a,b; 当有一个声明,例如int a,b; I want to save a and b in an simple array. 我想将a和b保存在一个简单的数组中。 I did manage to do that but it saving a bit more that wanted. 我确实做到了,但是节省了更多。 It is actually saving "a," or "b;" 实际上是在保存“ a”或“ b;”。 instead of "a" and "b". 而不是“ a”和“ b”。

It should have worked as $1 should only return tID which is a regular expression recognising only a string chain. $1应该只返回tID ,这是一个仅识别字符串链的正则表达式,它应该起作用。 I don't understand why it take the coma even though it defined as a token. 我不明白为什么即使将其定义为令牌也需要昏迷。 Does anyone know how to solve this problem ? 有谁知道如何解决这个问题?

Here is the corresponding yacc declarations : 这是相应的yacc声明:

Declaration :
    tINT Decl1 DeclN
        {printf("Declaration %s\n", $2);}
    | tCONST Decl1 DeclN
        {printf("Declaration %s\n", $2);}
;

Decl1 :
    tID 
        {$$ = $1;
        tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
        printf("Added %s at adress %d\n", $1, compteur);
        compteur++;}
    | tID tEQ E
        {$$ = $1;
        tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
        printf("Added %s at adress %d\n", $1, compteur);
        pile[compteur]=$3;
        compteur++;}
;

DeclN :
    /*epsilon*/
    | tVIR Decl1 DeclN

And the extract of the Lex file : 并提取Lex文件:

separateur [ \t\n\r]
id [A-Za-z][A-Za-z0-9_]*
nb [0-9]+
nbdec [0-9]+\.[0-9]+
nbexp [0-9]+e[0-9]+

","                     { return tVIR; }
";"                     { return tPV; }
"="                     { return tEQ; }

{separateur}            ;
{id}                   { yylval.str = yytext; return tID; }
{nb}|{nbdec}|{nbexp}   { yylval.nb = atoi(yytext); return tNB; }


%%
int yywrap() {return 1;}

The problem is that yytext is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex . 问题在于yytext是对lex的令牌扫描缓冲区的引用,因此它仅在解析器下次调用yylex时才有效。 You need to make a copy of the string in yytext if you want to return it. 如果要返回该字符串,则需要在yytext 复制该字符串。 Something like: 就像是:

{id}                   { yylval.str = strdup(yytext); return tID; }

will do the trick, though it also exposes you to the possibility of memory leaks. 可以解决问题,尽管它也会使您暴露出内存泄漏的可能性。

Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ',' , ';' , and '=' ) rather than defining named tokens ( tVIR , tPV , and tEQ in your code). 此外,写入涉及单个字符记号法/ yacc的解析器时在一般情况下,它是清晰直接使用它们作为字符内常数(例如','';' ,和'=' ),而不是定义命名令牌( tVIRtPV ,以及代码中的tEQ )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM