简体   繁体   中英

How to use regex to replace a specific group in a string using Python?

I want to read in a string and delete the captured group (in this case "[^ ]+(&)[^ ]).

x = "apple&bob & john & smith" # original string
x = "applebob & john & smith" #after replacing string

This is the code I am using now.

import re

and_regex = re.compile(r'([^ ]+(&)[^ ])')
x = "apple&bob & john & smith"
x = re.sub(and_regex, " ",x)
print(x)

I cannot use the string replace (string.replace) because it will replace the "&"s in the entire string.

Thanks for the help!

you can do this:

import re
x = "apple&bob & john & smith"
x = re.sub("(?<=\S)&(?=\S)", "",x)
print(x)

output:

applebob & john & smith

As al alternative, if you also want to remove the & char at the start and end in for example &apple&bob & john & smith& you can either assert a non whitespace char to the left OR assert a non whitespace char to the right.

(?<=\S)&|&(?=\S)

Regex demo

import re

strings = [
    "apple&bob & john & smith",
    "&apple&bob & john & smith&",
    "&apple&bob & john & smith&&"
]

for s in strings:
    print(re.sub(r"(?<=\S)&|&(?=\S)", "", s))

Output

applebob & john & smith
applebob & john & smith
applebob & john & smith

You can capture those parts you want to keep. And when replacing with .sub() method, enter the captures parts using \\1 and \\2 in the replacer string.

import re
pattern = re.compile(r'(\S+)&(\S+)')
# `\S` means: any non-white character.
# see: https://docs.python.org/3/library/re.html

x = "apple&bob & john & smith"
x = pattern.sub("\\1\\2", x) # or also: re.sub(pattern, "\\1\\2", x)

x
## 'applebob & john & smith'

However, this replaces only 1 occurrence, the leftmost non-overlapping one, we need a function to replace all occurrences in the string. One can solve it using recursion:

def replace_all_pattern(pattern, s):
    if bool(re.match(pattern, s)):
        res = re.sub(pattern, "\\1\\2", s)
        return replace_all_pattern(pattern, res)
    else:
        return s


replace_all_pattern(r"(\S+)&(\S+)", "abble&bob&john&smith")
## 'abblebobjohnsmith'

But this will be performance-wise less efficient than using look-arounds. So use this only if exactly one occurrence is to be replaced. In that case, preformance-wise, it is better than the look-arounds, but as soon as more than one occurrences are possible and have to be checked: use the look-arounds as pattern, because they will be more efficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM