Lex: How do I Prevent it from matching against substrings?

Question

For example, I'm supposed to convert "int" to "INT". But if there's the word "integer", I don't think it's supposed to turn into "INTeger".

If I define "int" printf("INT"); the substrings are matched though. Is there a way to prevent this from happening?

Answer 1

I believe the following captures what you want.

%{
#include <stdio.h>
%}

ws                      [\t\n ]

%%

{ws}int{ws}         { printf ("%cINT%c", *yytext, yytext[4]); }
.                       { printf ("%c", *yytext); }

To expand this beyond word boundaries ( {ws} , in this case) you will need to either add modifiers to ws or add more specifc checks.

Answer 2

well, here's how i did it:

(("int"([a-z]|[A-Z]|[0-9])+)|(([a-z]|[A-Z]|[0-9])+"int")) ECHO;
"int" printf("INT");

better suggestions welcome.

Answer 3

Lex will choose the rule with the longest possible match for the current input. To avoid substring matches you need to include an additional rule that is longer than int . The easiest way to do to this is to add a simple rule that picks up any string that is longer than one character, ie [a-zA-Z]+ . The entire lex program would look like this:-

%%

[\t ]+          /* skip whitespace */
int { printf("INT"); }
[a-zA-Z]+       /* catch-all to avoid substring matches */

%%

int main(int argc, char *argv[])
   {
   yylex();
   }

Lex: How do I Prevent it from matching against substrings?

Question

3 answers

solution1
2 ACCPTED 2010-03-02 01:30:53

solution2
1 2010-03-01 21:46:42

solution3
1 2010-03-02 01:18:37

Lex: How do I Prevent it from matching against substrings?

Question

3 answers

solution1 2 ACCPTED 2010-03-02 01:30:53

solution2 1 2010-03-01 21:46:42

solution3 1 2010-03-02 01:18:37

solution1
2 ACCPTED 2010-03-02 01:30:53

solution2
1 2010-03-01 21:46:42

solution3
1 2010-03-02 01:18:37