简体   繁体   English

正则表达式匹配除引号之间的单词之外的所有单词

[英]Regex match all words except those between quotes

In this example I want to select all words, except those between quotes (ie "results", "items", "packages", "settings" and "build_type", but not "compiler.version").在这个例子中,我想选择所有单词,除了引号之间的单词(即“results”、“items”、“packages”、“settings”和“build_type”,而不是“compiler.version”)。

results[0].items[0].packages[0].settings["compiler.version"] 
results[0].items[0].packages[0].settings.build_type

Here's what I know: I can target all words with这是我所知道的:我可以定位所有单词

[a-z_]+

and then target what's in between quotes with this:然后用这个定位引号之间的内容:

(?<=\")[\w.]+(?=\")

Is there any way to match the difference between the results of the first and second regex?有没有办法匹配第一个和第二个正则表达式的结果之间的差异? (ie words except if they are surrounded by double quotes) (即单词,除非它们被双引号包围)

Here 'sa regex playground with the example for convenience为方便起见,是一个带有示例的正则表达式游乐场

I believe this is the cleaner/simpler version of the solution you were searching for:我相信这是您正在寻找的解决方案的更干净/更简单的版本:

(?<!\")\b[a-z_]+\b(?!\")

Here's a demo这是一个演示

Please let me know if this was helpful/if this was what you wanted!请让我知道这是否有帮助/这是否是您想要的!

You can match strings between double quotes and then match and capture words optionally followed with dot separated words:您可以匹配双引号之间的字符串,然后匹配和捕获单词,可选择后跟点分隔的单词:

list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I)))

See the regex demo .请参阅正则表达式演示 Details :详情

  • "[^"]*" - a " char, zero or more chars other than " and then a " char "[^"]*" - 一个"字符,除"之外的零个或多个字符,然后是"字符
  • | - or - 或者
  • ([a-z_]\\w*(?:\\.[a-z_]\\w*)*) - Group 1: a letter or underscore followed with zero or more word chars and then zero or more sequences of a . ([a-z_]\\w*(?:\\.[a-z_]\\w*)*) - 第 1 组:字母或下划线后跟零个或多个单词字符,然后是零个或多个 a 序列. and then a letter or underscore followed with zero or more word chars.然后是一个字母或下划线,后跟零个或多个单词字符。

See the Python demo :请参阅Python 演示

import re
text = 'results[0].items[0].packages[0].settings["compiler.version"] '
print(list(filter(None, re.findall(r'"[^"]*"|([a-z_]\w*(?:\.[a-z_]\w*)*)', text, re.ASCII | re.I))))
# => ['results', 'items', 'packages', 'settings']

The re.ASCII option is used to make \\w match [a-zA-Z0-9_] without accounting for Unicode chars. re.ASCII选项用于使\\w匹配[a-zA-Z0-9_]而不考虑 Unicode 字符。

A word is not within a double-quoted substring if and only it is followed in the string by an even number of double-quotes (assuming the string is properly formatted and therefore contains an even number of double-quotes).当且仅当一个单词在字符串中跟随着偶数个双引号(假设字符串格式正确,因此包含偶数个双引号)时,它才不在双引号子字符串中。 You can use the following regular expression to match strings that are not contained within double-quoted substrings.您可以使用以下正则表达式来匹配未包含在双引号子字符串中的字符串。

[a-z_]+(?=(?:(?:[^\"\n]*\"){2})*[^\"\n]*\n)

Demo演示

The regular expression can be broken down as follows (alternatively, hover the cursor over each part of the expression at the link to obtain an explanation of its function).正则表达式可以分解如下(或者,将光标悬停在链接处表达式的每个部分上以获得对其功能的解释)。

[a-z_]+         # match one or more of the indicated characters
(?=             # begin a positive lookahead
  (?:           # begin an outer non-capture group
    (?:         # begin an inner non-capture group
      [^\"\n]*  # match zero or more characters other than " and \n 
      \"        # match "
    ){2}        # end inner non-capture group and execute twice
  )*            # end outer non-capture group and execute zero or more times
  [^\"\n]*      # match zero or more characters other than " and \n 
  \n            # match a newline
)               # end positive lookahead

\\n should be replaced by (?:\\n|$) if the last line may not have a line terminator.如果最后一行可能没有行终止符, \\n应替换为(?:\\n|$)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式 - 替换除大括号之外的所有字符 - Python regex - Replace all characters except those between braces 大写引号内的所有值,除了某些单词之后的值 - upper casing all values within quotes " " except those coming after certain words 正则表达式匹配引号之间字符串的首次出现,但排除某些单词? - Regex to match the first occurrence of a string between quotes, but exclude certain words? 如何将所有单词与正则表达式匹配,网址或类似字符除外? - How to match all words with regex, except urls or similiar? 返回所有单词的正则表达式,星号之间的任何文本除外 - Regex that returns all words, except any text between asterisks 尝试正则表达式所有大写单词,除了那些紧跟在 Python 中的单词 - Trying to regex all capitalized words EXCEPT those immediately following a period in Python 如何精确匹配字符串中的所有单词,包括以 ',?.?'"' 结尾的单词,但不使用正则表达式匹配任何其他标点符号? - How do I exact match all the words in a string includes those ends with '!,?.'"' but do not match those with any other punctuation using regex? 使用 RegEx 匹配所有 substring 除了那些以特定字符开头的 - Using RegEx to match all substring except those who start specific char 正则表达式匹配所有单词序列 - Regex to match all sequences of words 匹配除特定单词之外的所有字符 - Match all characters except for certain words
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM