简体   繁体   English

编译将自动转义或忽略特殊字符的正则表达式

[英]Compile a regex that will automatically escape or ignore special characters

I'm using the results of one regex to build another regex, more or less like this: 我正在使用一个正则表达式的结果来构建另一个正则表达式,或多或少是这样的:

regex = '(?P<prev>.+?)(?P<hook>\%\%.+?\%\%)(?P<next>.+?$)'
match = re.search(regex, content, re.S)

comparisonRegex = match.group('prev') + 
    '(?P<desiredContent>desireable)' + match.group('next')
match = re.search(comparisonRegex, otherContent, re.S)

this approach works fine, but sometimes it will throw this error: 这种方法可以正常工作,但有时会抛出此错误:

  File "/path/to/my/script/refactor_static.py", line 92, in dynamicContent
    match = re.search(comparisonRegex, crawlFileContent, re.S)
  File "/usr/lib/python2.7/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

I'm fairly confident this is because the content I'm searching through and using as a new regex has invalid characters or sequences in it, but I'm not sure how to approach this. 我相当有信心,这是因为我正在搜索并用作新正则表达式的内容中包含无效字符或序列,但是我不确定该如何处理。 Is there an argument I can pass that will essentially tell it to compile all the letters as literals and not as special characters? 我是否可以通过一个参数,该参数实际上将告诉它将所有字母编译为文字而不是特殊字符? So far I haven't been able to find anything in the python regex guide . 到目前为止,我还没有在python regex指南中找到任何东西。

re.escape

regex = '(?P<prev>.?+)(\%\%.+?\%\%)(?P<next>.+?$)'
match = re.search(regex, content, re.S)

comparisonRegex = re.escape(match.group('prev')) + 
    '(?P<desiredContent>desireable)' + re.escape(match.group('next'))
match = re.search(comparisonRegex, otherContent, re.S)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM