简体   繁体   中英

Regular expression for a string literal in flex/lex

I'm experimenting to learn flex and would like to match string literals. My code currently looks like:

"\""([^\n\"\\]*(\\[.\n])*)*"\""        {/*matches string-literal*/;}

I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.

I am probably just writing a poor regular expression or one incompatible with flex. Please advise!

A string consists of a quote mark

"

followed by zero or more of either an escaped anything

\\.

or a non-quote character, non-backslash character

[^"\\]

and finally a terminating quote

"

Put it all together, and you've got

\"(\\.|[^"\\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.

对于单行......你可以使用这个:

\"([^\\\"]|\\.)*\"  {/*matches string-literal on a single line*/;}

How about using a start state...

int enter_dblquotes = 0;

%x DBLQUOTES
%%

\"  { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>*\" 
{ 
   if (enter_dblquotes){
       handle_this_dblquotes(yytext); 
       BEGIN(INITIAL); /* revert back to normal */
       enter_dblquotes--; 
   } 
}
         ...more rules follow...

It was similar to that effect (flex uses %s or %x to indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

Paste my code snippet about handling string in flex, hope inspire your thinking.

Use Start Condition to handle string literal will be more scalable and clear.

%x SINGLE_STRING

%%

\"                          BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
  \n                        yyerror("the string misses \" to termiate before newline");
  <<EOF>>                   yyerror("the string misses \" to terminate before EOF");
  ([^\\\"]|\\.)*            {/* do your work like save in here */}
  \"                        BEGIN(INITIAL);
  .                         ;
}

一个迟到但对下一个需要它的人有用的答案:

\"(([^\"]|\\\")*[^\\])?\"

这就是我们在Zolang 中用于带有嵌入式模板的单行字符串文字${...}

\\"(\\$\\{.*\\}|\\\\.|[^\\"\\\\])*\\"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM