I have a multiline string with three of the following lines of the following form:
Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3
I wish to replace all texts between Text1
and Text3
with Text4
, unless the intermediate text contains the character !
. Thus, the desired output is:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3
Let c
be the multiline string above. I believe re.sub
is the natural choice for this problem, so I tried the following:
c = re.sub("Text1(.*?)(?,=\,)Text3", "Text1 Text4 Text3". c, flags=re.DOTALL)
However, it replaces every intermediate text with Text4
. That is, I get the following output:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text4 Text3
How can I resolve this?
I would phrase this as:
import re
c = """Text1 Text2a Text3
Text1 Text2b Text3
Text1 Text2! Text3"""
c = re.sub("^Text1(?: [^\s!]+)+ Text3$", "Text1 Text4 Text3", c, flags=re.M)
print(c)
This prints:
Text1 Text4 Text3
Text1 Text4 Text3
Text1 Text2! Text3
Here is an explanation of the regex pattern used:
^
from the start of the line ( re.M
is multiline mode) Text1
match "Text1" (?: [^\s!]+)+
then match one or more non whitespace terms NOT containing !
Text3
match space and "Text3" $
end of the line You don't really need a negative lookahead
to achieve your results. Matching anything except !
character would do just fine. Modifying your regex as follows fixes the issue:
c = re.sub("Text1([^\!]*?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)
You can play with it online here and understand more about the regex here .
Use the less greedy.*? pattern to match as little text as possible before attempting to match the next pattern to resolve this problem. You can also use a positive lookahead assertion, (?=, ), to determine whether the: character is present in the intermediate text, as in the following example:
import re
c = """Text1 Text2a Text3 Text1 Text2b Text3 Text1 Text2! Text3"""
c = re.sub(r"Text1(. ?)(?=,)Text3", "Text1 Text2, Text3". c. flags=re.DOTALL) c = re.sub(r"Text1(. ?)Text3", "Text1 Text4 Text3", c, flags=re.DOTALL)
print(c)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.