繁体   English   中英

在python中使用正则表达式仅匹配未加引号的单词

[英]Match only non-quoted words using a regex in python

在尝试处理某些代码时,我需要找到使用特定列表中的变量的实例。 问题是,代码被混淆了,那些变量名也可能出现在字符串中,例如,我不想匹配。

但是,我无法找到一个正则表达式来匹配仅适用于python的不带引号的单词...

"[^\\\\]((\")|('))(?(2)([^\"]|\\\")*|([^']|\\')*)[^\\\\]\\1|(\w+)"

应将所有未引用的单词与最后一组(第六组,索引5,基于0的索引)进行匹配。 为了避免匹配以引号开头的字符串,需要进行一些小的修改。

说明:

[^\\\\] Match any character but an escape character. Escaped quotes do not start a string.
((\")|(')) Immediately after the non-escaped character, match either " or ', which starts a string. This is group 1, which contains groups 2 (\") and 3 (')
(?(2) if we matched group 2 (a double-quote)
    ([^\"]|\\\")*| match anything but double quotes, or match escaped double quotes. Otherwise:
    ([^']|\\')*) match anything but a single quote or match an escaped single quote.
        If you wish to retrieve the string inside the quotes, you will have to add another group: (([^\"]|\\\")*) will allow you to retrieve the whole consumed string, rather than just the last matched character.
        Note that the last character of a quoted string will actually be consumed by the last [^\\\\]. To retrieve it, you have to turn it into a group: ([^\\\\]). Additionally, The first character before the quote will also be consumed by [^\\\\], which might be meaningful in cases such as r"Raw\text".
[^\\\\]\\1 will match any non-escape character followed by what the first group matched again. That is, if ((\")|(')) matched a double quote, we requite a double quote to end the string. Otherwise, it matched a single quote, which is what we require to end the string.
|(\w+) will match any word. This will only match if non-quoted strings, as quoted strings will be consumed by the previous regex.

例如:

import re
non_quoted_words = "[^\\\\]((\")|('))(?(2)([^\"]|\\\")*|([^']|\\')*)[^\\\\]\\1|(\w+)"
quote = "This \"is an example ' \\\" of \" some 'text \\\" like wtf' \\\" is what I said."
print(quote)
print(re.findall(non_quoted_words,quote))

将返回:

This "is an example ' \" of " some 'text \" like wtf' \" is what I said.
[('', '', '', '', '', 'This'), ('"', '"', '', 'f', '', ''), ('', '', '', '', '', 'some'), ("'", '', "'", '', 't', ''), ('', '', '', '', '', 'is'), ('', '', '', '', '', 'what'), ('', '', '', '', '', 'I'), ('', '', '', '', '', 'said')]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM