简体   繁体   中英

Replace multiple patterns inside the same boundaries in Python

Text to search:

{ Field1:Value Field2:Value Field1:Value }

I want then, to find, and replace, each instance of Field when it equals a certain value, but only when the field lies within the { & } boundaries.

Note: There can be variable whitespace after the { but before the first Field. And a varying number of fields.

A regex of:

({\s*)(?:Field1)(:(?:.*?)})

Allows me to split the text into groups, allowing me to recreate new text with a different field name.

eg:

\1Field3\2

However, this will only match the first instance of Field1 and ignores the second, as the regex engine continues from the closing }

I thought about then using Lookbehind and Lookahead, buy Python doesn't support variable repetition for these methods, so that was out.

The "re.sub" method returns the text, altered if the regex was found/replaced, but doesn't actually say whether or not a replace was done, so I can't even loop until no matches found, unless I do a second regex to verify which doesn't feel right.

Is there any way I can do this in a single regex?

----- EDIT -----

I managed to distill both Emmanuel & Ωmega's solutions into what I was looking for, albeit still in multiple stages, however I think that in this case, multiple steps is the only solution.

Emmanuel's Code (working for my solution)

s = '{ Field1:Value Field2:Value Field1:Value } Field1:Value {Field1:Value}'
for insider in re.findall(r'{\s*Field1:.*?}', s):
    new = re.sub(r'Field1:', r'NewField:', insider)
    s = s.replace(insider, new)

Ωmega's Code (working for my solution)

def evaluate(m):
  return re.sub('Field1:', 'NewField:', m.group(0))
input = '{ Field1:Value Field2:Value Field1:Value } Field1:Value {Field1:Value}'
output = re.sub('{[^{}]*?}', evaluate, input)

Python does support indefinite repetition in lookahead assertions. So, unless you have nested { / } structures (which would be impossible to handle with Python's regex engine), you can simply check whether the next brace that follows is a closing brace:

>>> import re
>>> subject = "{Field1:Value Field2:Value} Field3:Value {Field4:Value}"
>>> re.sub(r"Field(\d+):(\w+)(?=[^{}]*\})", r"NewField\1:\2", subject)
'{NewField1:Value NewField2:Value} Field3:Value {NewField4:Value}'
def evaluate(m):
  return re.sub(r'(\w+):(' + re.escape(value) + r'\b)', field + ":\\2", m.group(0))

output = re.sub(r'\{[^{}]*\}', evaluate, input)

See this demo .

Tim's answer is really great. Mine proceeds in 2 steps:

  • get all content between braces
  • loop on all of them to proceed with the effective replace

This gives:

>>> s = '{ Field1:Value Field2:Value Field1:Value } Field4:Value {Field5:Value    }'
>>> for insider in re.findall(r'{((?:\s*Field\d+:Value\s*)*)}', s):
    new = re.sub(r'\s*Field(\d+):(\w+)\s*', r' NewField\1:\2', insider)
    s = s.replace(insider, new)

>>> s
'{ NewField1:Value NewField2:Value NewField1:Value} Field4:Value { NewField5:Value}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM