简体   繁体   中英

Replace string between two strings unless it contains a substring

I have a multiline string with three of the following lines of the following form:

Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3

I wish to replace all texts between Text1 and Text3 with Text4 , unless the intermediate text contains the character ! . Thus, the desired output is:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Let c be the multiline string above. I believe re.sub is the natural choice for this problem, so I tried the following:

c = re.sub("Text1(.*?)(?,=\,)Text3", "Text1 Text4 Text3". c, flags=re.DOTALL)

However, it replaces every intermediate text with Text4 . That is, I get the following output:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3

How can I resolve this?

I would phrase this as:

import re

c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""

c = re.sub("^Text1(?: [^\s!]+)+ Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)

This prints:

Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3

Here is an explanation of the regex pattern used:

  • ^ from the start of the line ( re.M is multiline mode)
  • Text1 match "Text1"
  • (?: [^\s!]+)+ then match one or more non whitespace terms NOT containing !
  • Text3 match space and "Text3"
  • $ end of the line

You don't really need a negative lookahead to achieve your results. Matching anything except ! character would do just fine. Modifying your regex as follows fixes the issue:

c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

You can play with it online here and understand more about the regex here .

Use the less greedy.*? pattern to match as little text as possible before attempting to match the next pattern to resolve this problem. You can also use a positive lookahead assertion, (?=, ), to determine whether the: character is present in the intermediate text, as in the following example:

import re

c = """Text1 Text2a Text3 Text1 Text2b Text3 Text1 Text2! Text3"""

c = re.sub(r"Text1(. ?)(?=,)Text3", "Text1 Text2, Text3". c. flags=re.DOTALL) c = re.sub(r"Text1(. ?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)

print(c)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM