简体   繁体   中英

Flex (lexical analyzer) not recognizing or operator

I have a problem with flex. It doesn't recognize the or operator in this rule:

[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}

If I split it into three rules then it is recognized:

[0-9A-Za-z]+{CORRECT}  {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}

To explain myself better the pattern I am trying to recognize is:

CORRECT [1-9]*_[1-9]*0

And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.

Full flex code:

%option noyywrap

%{
#include <stdio.h>
int num_lines=1;

%}

CORRECT [1-9]*_[1-9]*0

%%
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }

[0-9A-Za-z]+{CORRECT}  { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s  \n",num_lines,yytext); }

"\n" { num_lines++; }

 " "
 "\t"
 "\r"

 . { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}

 %%

 int main(int argc,char **argv)
 {
++argv,--argc;
if(argc>0) 
    yyin=fopen(argv[0],"r");
else
    yyin=stdin;
yylex();
 }

Whitespace is significant in a lex pattern. a | b a | b is not the same as a|b . In the troublesome pattern, you have whitespace that I don't think you intended.

That said, in my opinion, your 3-pattern solution is easier to read and maintain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM