[英]Regex match all words except those between quotes
In this example I want to select all words, except those between quotes (ie "results", "items", "packages", "settings" and "build_type", but not "compiler.version").在这个例子中,我想选择所有单词,除了引号之间的单词(即“results”、“items”、“packages”、“settings”和“build_type”,而不是“compiler.version”)。
results[0].items[0].packages[0].settings["compiler.version"]
results[0].items[0].packages[0].settings.build_type
Here's what I know: I can target all words with这是我所知道的:我可以定位所有单词
[a-z_]+
and then target what's in between quotes with this:然后用这个定位引号之间的内容:
(?<=\")[\w.]+(?=\")
Is there any way to match the difference between the results of the first and second regex?有没有办法匹配第一个和第二个正则表达式的结果之间的差异? (ie words except if they are surrounded by double quotes) (即单词,除非它们被双引号包围)
Here 'sa regex playground with the example for convenience为方便起见,这是一个带有示例的正则表达式游乐场
I believe this is the cleaner/simpler version of the solution you were searching for:我相信这是您正在寻找的解决方案的更干净/更简单的版本:
(?<!\")\b[a-z_]+\b(?!\")
Please let me know if this was helpful/if this was what you wanted!请让我知道这是否有帮助/这是否是您想要的!
You can match strings between double quotes and then match and capture words optionally followed with dot separated words:您可以匹配双引号之间的字符串,然后匹配和捕获单词,可选择后跟点分隔的单词:
list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I)))
See the regex demo .请参阅正则表达式演示。 Details :详情:
"[^"]*"
- a "
char, zero or more chars other than "
and then a "
char "[^"]*"
- 一个"
字符,除"
之外的零个或多个字符,然后是"
字符|
- or - 或者([a-z_]\\w*(?:\\.[a-z_]\\w*)*)
- Group 1: a letter or underscore followed with zero or more word chars and then zero or more sequences of a .
([a-z_]\\w*(?:\\.[a-z_]\\w*)*)
- 第 1 组:字母或下划线后跟零个或多个单词字符,然后是零个或多个 a 序列.
and then a letter or underscore followed with zero or more word chars.然后是一个字母或下划线,后跟零个或多个单词字符。See the Python demo :请参阅Python 演示:
import re
text = 'results[0].items[0].packages[0].settings["compiler.version"] '
print(list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I))))
# => ['results', 'items', 'packages', 'settings']
The re.ASCII
option is used to make \\w
match [a-zA-Z0-9_]
without accounting for Unicode chars. re.ASCII
选项用于使\\w
匹配[a-zA-Z0-9_]
而不考虑 Unicode 字符。
A word is not within a double-quoted substring if and only it is followed in the string by an even number of double-quotes (assuming the string is properly formatted and therefore contains an even number of double-quotes).当且仅当一个单词在字符串中跟随着偶数个双引号(假设字符串格式正确,因此包含偶数个双引号)时,它才不在双引号子字符串中。 You can use the following regular expression to match strings that are not contained within double-quoted substrings.您可以使用以下正则表达式来匹配未包含在双引号子字符串中的字符串。
[a-z_]+(?=(?:(?:[^\"\n]*\"){2})*[^\"\n]*\n)
The regular expression can be broken down as follows (alternatively, hover the cursor over each part of the expression at the link to obtain an explanation of its function).正则表达式可以分解如下(或者,将光标悬停在链接处表达式的每个部分上以获得对其功能的解释)。
[a-z_]+ # match one or more of the indicated characters
(?= # begin a positive lookahead
(?: # begin an outer non-capture group
(?: # begin an inner non-capture group
[^\"\n]* # match zero or more characters other than " and \n
\" # match "
){2} # end inner non-capture group and execute twice
)* # end outer non-capture group and execute zero or more times
[^\"\n]* # match zero or more characters other than " and \n
\n # match a newline
) # end positive lookahead
\\n
should be replaced by (?:\\n|$)
if the last line may not have a line terminator.如果最后一行可能没有行终止符, \\n
应替换为(?:\\n|$)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.