简体   繁体   English

用flex表示的字符文字

[英]character literal representing in flex

I am trying to write regular expression for String literal and character literal in java using flex.. 我正在尝试使用flex在Java中为String文字和字符文字编写正则表达式。

I was able to write String literal correctly as you can see in following, but I am not able to write regular expression for character literal. 正如您在下面看到的那样,我能够正确地编写String文字,但是我无法为字符文字编写正则表达式。 It extracts only first letter. 它仅提取第一个字母。

For example: in my java program, I have the following two variables: 例如:在我的java程序中,我有以下两个变量:

String test_string = "Java is an artificial language.";
char c2  = '\u0041';

In my flex file is: 在我的flex文件中是:

SP  (u8|u|U|L)
ES  (\\(['"\?\\abfnrtv]|[0-7]{1,3}|x[a-fA-F0-9]+))
WS  [ \t\v\n\f]
%%
({SP}?\"([^"\\\n]|{ES})*\"{WS}*)+   {printf("that's string %s\n",yytext);}
'[^'\\\n]|{ES}' {printf("that's char %s\n",yytext);}

The result is: 结果是:

id:test_string
that's string "Java is an artificial language."
id:char
id:c2
id:u0041
that's char ';

'[^'\\\\\\n]|{ES}' means '[^'\\\\\\n] or {ES}' . '[^'\\\\\\n]|{ES}'意思是'[^'\\\\\\n]{ES}' I suppose you wanted: 我想你想要:

'([^'\\\n]|{ES})'

In addition, your pattern macro ES does not recognize unicode escapes of the form \\uXXXX . 另外,您的模式宏ES无法识别\\uXXXX格式的Unicode转义。 So you'll need to add those if you want to recognize '\A' . 因此,如果要识别'\A'则需要添加它们。

Personally, I think you are trying to do too much with your escape pattern. 就个人而言,我认为您正在尝试对逃生模式做太多事情。 I usually just use \\\\. 我通常只使用\\\\. or \\\\(.|\\n) , for example (the second pattern allows for line continuations, if they have not already been removed by a prior operation). \\\\(.|\\n) (例如,第二种模式允许行连续,如果先前的操作尚未将其删除)。 If you want to only recognize correct escapes, then you also need to think through your response to incorrect escapes. 如果只想识别正确的转义符,那么您还需要考虑对不正确的转义符的响应。 Remember that a lexical scanner needs to do something with every possible input, not just every legal input. 请记住,一个词法扫描仪需要做一些与所有可能的输入,而不仅仅是每个合法输入。

Without seeing your entire flex input I cannot tell for sure, but my guess is that you have a fallback rule like . { return *yytext; } 没有看到您的整个flex输入,我无法确定,但是我的猜测是您有一个后备规则,例如. { return *yytext; } . { return *yytext; } . { return *yytext; } . . { return *yytext; } That's all very good, but if you reject character and string literals with invalid escape patterns, it means that such literals will end up invoking the fallback rule, recognizing only the initial quote (or apostrophe). 这一切都很好,但是如果您拒绝使用无效转义模式的字符和字符串文字,则意味着此类文字最终将调用后备规则,仅识别初始引号(或撇号)。 That will almost certainly produce an error in the parser, but it will prove difficult to recover from that error because you will then be scanning the rest of the string/character literal as though it were unquoted (and will thus end up scanning what follows the closing quote/apostrophe as though it were quoted. 这几乎肯定会在解析器中产生一个错误,但是将很难从该错误中恢复,因为您随后将扫描字符串/字符文字的其余部分,就好像它们没有被引用一样(因此最终将扫描紧随其后的内容)。引号/撇号,就好像引号一样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM