简体   繁体   English

使用 flex 匹配带有转义字符的 Python 多行字符串

[英]Using flex for matching Python multiline strings with escaped characters

I wonder how to match python multiple line comments with flex.我想知道如何将 python 多行注释与 flex 匹配。 And I meet some troubles, the following works fine on Regexr , but not recognized by flex, I don't know how to fix it.我遇到了一些麻烦,以下在Regexr上工作正常,但没有被 flex 识别,我不知道如何解决它。

"""[^"\\]*(?:(?:\\.|"{1,2}(?!"))[^"\\]*)*"""

Previously, I used:以前,我使用过:

["]{3}(\\["])*(["]{0,2}[^"](\\["])*)*["]{3}

which can detect comments like :它可以检测如下评论:

"""A\"""A"""

However, it cannot deal with multiple \\, like但是,它不能处理多个\\,例如

'''A\\\\'''A=B'''C'''

which recognize it as a whole not :整体上不承认它:

'''A\\\\'''  (comment)   
A=B     
'''C'''(comment) 

You can recognize Python long strings with a single regex.您可以使用单个正则表达式识别 Python 长字符串。 It's not pretty, but I believe it works:它不漂亮,但我相信它有效:

["]{3}(["]{0,2}([^\\"]|\\(.|\n)))*["]{3}

This is fairly similar to your original regex, but it does not attempt to limit its backslash handling to \\" , so that it can correctly identify \\\\ as a backslashed character.这与您的原始正则表达式非常相似,但它不会尝试将其反斜杠处理限制为\\" ,因此它可以将\\\\正确识别为反斜杠字符。

A possibly easier to read (but slightly slower) solution is to use start a start condition.一个可能更容易阅读(但稍微慢一点)的解决方案是使用 start a start 条件。 Here I use yymore() to create a single token which does not include the """ delimiters, but production code would probably seek to interpret Python's various backslash escapes. (It is precisely this need which motivates the use of a start condition rather than trying to recognize the entire string with a single regex.)在这里,我使用yymore()创建一个不包含"""分隔符的标记,但生产代码可能会试图解释 Python 的各种反斜杠转义。(正是这种需要促使使用开始条件而不是尝试使用单个正则表达式识别整个字符串。)

%x SC_LONGSTRING
%%
["]{3}     BEGIN(SC_LONGSTRING);
<SC_LONGSTRING>{
  [^\\"]+  yymore();
  \\(.|\n) yymore();
  ["]["]?  yymore();
  ["]{3}   { BEGIN(INITIAL);
             yylval.str = malloc(yyleng - 2);
             memcpy(yylval.str, yytext, yyleng - 3);
             yylval.str[yyleng - 3] = 0;
             return TOKEN_STRING;
           }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM