简体   繁体   中英

Python regex - Find and replace the second item of a pair

I've been trying to do the following : Given a char like "i", find and replace the second of every pair of "i" (without overlapping).

"I am so irritated with regex. Seriously" -> "I am so rritated wth regex. Seriously". 

I almost found a solution using positive lookbehind, but it's overlapping :(

Can anyone help me?

My best was this (I think) -> "(?<=i).*?(i)"

EDIT : My description is wrong. I am supposed to replace the SECOND item of a pair, so the result should've been: "I am so irrtated with regex. Serously"

Your regex matches overlapped substrings because of the lookbehind (?<=i) . You need to use a consuming pattern for non-overlapping matches:

i([^i]*i)

Replace with \\1 backreference to the text captured with ([^i]*i) . See the regex demo .

The pattern matches:

  • i - a literal i , after matching it, the regex index advances to the right (the regex engine processes the string from left to right by default, in re , there is no other option), 1 char
  • ([^i]*i) - this is Group 1 matching 0+ characters other than i up to the first i . The whole captured value is inside .group(1) . After matching it, the regex index is after the second i matched and consumed with the whole pattern. Thus, no overlapping matches occur when the regex engine goes on to look for the remaining matches in the string.

Python demo :

import re
pat = "i"
p = re.compile('{0}([^{0}]*{0})'.format(pat))
test_str = "I am so irritated with regex. Seriously"
result = re.sub(p, r"\1", test_str)
print(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM