简体   繁体   中英

character literal representing in flex

I am trying to write regular expression for String literal and character literal in java using flex..

I was able to write String literal correctly as you can see in following, but I am not able to write regular expression for character literal. It extracts only first letter.

For example: in my java program, I have the following two variables:

String test_string = "Java is an artificial language.";
char c2  = '\u0041';

In my flex file is:

SP  (u8|u|U|L)
ES  (\\(['"\?\\abfnrtv]|[0-7]{1,3}|x[a-fA-F0-9]+))
WS  [ \t\v\n\f]
%%
({SP}?\"([^"\\\n]|{ES})*\"{WS}*)+   {printf("that's string %s\n",yytext);}
'[^'\\\n]|{ES}' {printf("that's char %s\n",yytext);}

The result is:

id:test_string
that's string "Java is an artificial language."
id:char
id:c2
id:u0041
that's char ';

'[^'\\\\\\n]|{ES}' means '[^'\\\\\\n] or {ES}' . I suppose you wanted:

'([^'\\\n]|{ES})'

In addition, your pattern macro ES does not recognize unicode escapes of the form \\uXXXX . So you'll need to add those if you want to recognize '\A' .

Personally, I think you are trying to do too much with your escape pattern. I usually just use \\\\. or \\\\(.|\\n) , for example (the second pattern allows for line continuations, if they have not already been removed by a prior operation). If you want to only recognize correct escapes, then you also need to think through your response to incorrect escapes. Remember that a lexical scanner needs to do something with every possible input, not just every legal input.

Without seeing your entire flex input I cannot tell for sure, but my guess is that you have a fallback rule like . { return *yytext; } . { return *yytext; } . { return *yytext; } . That's all very good, but if you reject character and string literals with invalid escape patterns, it means that such literals will end up invoking the fallback rule, recognizing only the initial quote (or apostrophe). That will almost certainly produce an error in the parser, but it will prove difficult to recover from that error because you will then be scanning the rest of the string/character literal as though it were unquoted (and will thus end up scanning what follows the closing quote/apostrophe as though it were quoted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM