简体   繁体   中英

Regular expression: match word not between quotes

I would like a Python regular expression that matches a given word that's not between simple quotes. I've tried to use the (?! ...) but without success.

In the following screenshot, I would like to match all foe except the one in the 4th line.

Plus, the text is given as one big string.

Here is the link regex101 and the sample text is below:

var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe

你可以试试这个:-

((?!\\'[\\w\\s]*)foe(?![\\w\\s]*\\'))

How about this regular expression:

>>> s = '''var foe = 10;
foe = "";
dark_vador = 'bad guy'
' I\m your father, foe ! '
bar = thingy + foe'''
>>>
>>> re.findall(r'(?!\'.*)foe(?!.*\')', s)
['foe', 'foe', 'foe']

The trick here is to make sure the expression does not match any string with leading and trailing ' and to remember to account for the characters in between, thereafter .* in the re expression.

在此处输入图片说明

((?!\'[\w\s]*[\\']*[\w\s]*)foe(?![\w\s]*[\\']*[\w\s]*\'))

A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, eg in comments.

A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.

Here is a sample Python demo:

import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
    var foe = 10;
    foe = "";
    dark_vador = 'bad guy'
    foe = ' I\'m your father, foe ! '
    bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)

See the Python demo

The regex will look like

('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b

See the regex demo .

The ('[^'\\\\]*(?:\\\\.[^'\\\\]*)*') part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \\bfoe\\b matches whole words foe in any other string context - and subsequently is replaced with another word.

NOTE : To also match double quoted string literals, use r"('[^'\\\\]*(?:\\\\.[^'\\\\]*)*'|\\"[^\\"\\\\]*(?:\\\\.[^\\"\\\\]*)*\\")" .

Capture group 1 of the following regular expression will contain matches of 'foe' .

r'^(?:[^'\n]|\\')*(?:(?<!\\)'(?:[^'\n]|\\')*(?:(?<!\\)')(?:[^'\n]|\\')*)*\b(foe)\b'

Start your engine!

Python's regex engine performs the following operations.

^           : assert beginning of string
(?:         : begin non-capture group
  [^'\n]    : match any char other than single quote and line terminator
  |         : or
  \\'       : match '\' then a single quote
)           : end non-capture group   
*           : execute non-capture group 0+ times
(?:         : begin non-capture group
  (?<!\\)   : next char is not preceded by '\' (negative lookbehind)
  '         : match single quote
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
  (?:       : begin non-capture group
    (?<!\\) : next char is not preceded by '\' (negative lookbehind)
    '       : match single quote
  )         : end non-capture group
  (?:       : begin non-capture group
    [^'\n]  : match any char other than single quote and line terminator
    |       : or
    \\'     : match '\' then a single quote
  )         : end non-capture group   
  *         : execute non-capture group 0+ times
)           : end non-capture group
*           : execute non-capture group 0+ times
\b(foe)\b   : match 'foe' in capture group 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM