简体   繁体   中英

Lexical & Grammar Analysis using Yacc and Lex

I am fairly new to Yacc and Lex programming but I am training myself with a analyser for C programs.

However, I am facing a small issue that I didn't manage to solve.

When there is a declaration for example like int a,b; I want to save a and b in an simple array. I did manage to do that but it saving a bit more that wanted. It is actually saving "a," or "b;" instead of "a" and "b".

It should have worked as $1 should only return tID which is a regular expression recognising only a string chain. I don't understand why it take the coma even though it defined as a token. Does anyone know how to solve this problem ?

Here is the corresponding yacc declarations :

Declaration :
    tINT Decl1 DeclN
        {printf("Declaration %s\n", $2);}
    | tCONST Decl1 DeclN
        {printf("Declaration %s\n", $2);}
;

Decl1 :
    tID 
        {$$ = $1;
        tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
        printf("Added %s at adress %d\n", $1, compteur);
        compteur++;}
    | tID tEQ E
        {$$ = $1;
        tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
        printf("Added %s at adress %d\n", $1, compteur);
        pile[compteur]=$3;
        compteur++;}
;

DeclN :
    /*epsilon*/
    | tVIR Decl1 DeclN

And the extract of the Lex file :

separateur [ \t\n\r]
id [A-Za-z][A-Za-z0-9_]*
nb [0-9]+
nbdec [0-9]+\.[0-9]+
nbexp [0-9]+e[0-9]+

","                     { return tVIR; }
";"                     { return tPV; }
"="                     { return tEQ; }

{separateur}            ;
{id}                   { yylval.str = yytext; return tID; }
{nb}|{nbdec}|{nbexp}   { yylval.nb = atoi(yytext); return tNB; }


%%
int yywrap() {return 1;}

The problem is that yytext is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex . You need to make a copy of the string in yytext if you want to return it. Something like:

{id}                   { yylval.str = strdup(yytext); return tID; }

will do the trick, though it also exposes you to the possibility of memory leaks.

Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ',' , ';' , and '=' ) rather than defining named tokens ( tVIR , tPV , and tEQ in your code).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM