简体   繁体   English

正则表达式:匹配单词不在引号之间

[英]Regular expression: match word not between quotes

I would like a Python regular expression that matches a given word that's not between simple quotes.我想要一个 Python 正则表达式来匹配不在简单引号之间的给定单词。 I've tried to use the (?! ...) but without success.我试过使用(?! ...)但没有成功。

In the following screenshot, I would like to match all foe except the one in the 4th line.在下面的屏幕截图中,我想匹配除第 4 行中的foe之外的所有foe

Plus, the text is given as one big string.另外,文本作为一个大字符串给出。

Here is the link regex101 and the sample text is below:这是链接regex101 ,示例文本如下:

var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe

你可以试试这个:-

((?!\\'[\\w\\s]*)foe(?![\\w\\s]*\\'))

How about this regular expression:这个正则表达式怎么样:

>>> s = '''var foe = 10;
foe = "";
dark_vador = 'bad guy'
' I\m your father, foe ! '
bar = thingy + foe'''
>>>
>>> re.findall(r'(?!\'.*)foe(?!.*\')', s)
['foe', 'foe', 'foe']

The trick here is to make sure the expression does not match any string with leading and trailing ' and to remember to account for the characters in between, thereafter .* in the re expression.这里的技巧是确保表达式不匹配任何带有前导和尾随'字符串,并记住在 re 表达式中考虑它们之间的字符,然后是.*

在此处输入图片说明

((?!\'[\w\s]*[\\']*[\w\s]*)foe(?![\w\s]*[\\']*[\w\s]*\'))

A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, eg in comments.下面的正则表达式解决方案在大多数情况下都有效,但如果不平衡的单引号出现在字符串文字之外,例如在注释中,它可能会中断。

A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.在上下文中匹配字符串的常用正则表达式技巧是匹配您需要替换的内容并匹配并捕获您需要保留的内容。

Here is a sample Python demo:这是一个示例 Python 演示:

import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
    var foe = 10;
    foe = "";
    dark_vador = 'bad guy'
    foe = ' I\'m your father, foe ! '
    bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)

See the Python demo查看Python 演示

The regex will look like正则表达式看起来像

('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b

See the regex demo .请参阅正则表达式演示

The ('[^'\\\\]*(?:\\\\.[^'\\\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \\bfoe\\b matches whole words foe in any other string context - and subsequently is replaced with another word. ('[^'\\\\]*(?:\\\\.[^'\\\\]*)*')部分将单引号字符串文字捕获到组 1 中,如果匹配,则将其放回结果中, 和\\bfoe\\b匹配任何其他字符串上下文中的整个单词foe - 随后被另一个单词替换。

NOTE : To also match double quoted string literals, use r"('[^'\\\\]*(?:\\\\.[^'\\\\]*)*'|\\"[^\\"\\\\]*(?:\\\\.[^\\"\\\\]*)*\\")" .注意:要同时匹配双引号字符串文字,请使用r"('[^'\\\\]*(?:\\\\.[^'\\\\]*)*'|\\"[^\\"\\\\]*(?:\\\\.[^\\"\\\\]*)*\\")"

Capture group 1 of the following regular expression will contain matches of 'foe' .以下正则表达式的捕获组 1 将包含'foe'匹配项。

r'^(?:[^'\n]|\\')*(?:(?<!\\)'(?:[^'\n]|\\')*(?:(?<!\\)')(?:[^'\n]|\\')*)*\b(foe)\b'

Start your engine!启动你的引擎!

Python's regex engine performs the following operations. Python 的正则表达式引擎执行以下操作。

^           : assert beginning of string
(?:         : begin non-capture group
  [^'\n]    : match any char other than single quote and line terminator
  |         : or
  \\'       : match '\' then a single quote
)           : end non-capture group   
*           : execute non-capture group 0+ times
(?:         : begin non-capture group
  (?<!\\)   : next char is not preceded by '\' (negative lookbehind)
  '         : match single quote
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
  (?:       : begin non-capture group
    (?<!\\) : next char is not preceded by '\' (negative lookbehind)
    '       : match single quote
  )         : end non-capture group
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
)           : end non-capture group
*           : execute non-capture group 0+ times
\b(foe)\b   : match 'foe' in capture group 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM