How to use regex to replace a specific group in a string using Python?

Question

I want to read in a string and delete the captured group (in this case "[^ ]+(&)[^ ]).

x = "apple&bob & john & smith" # original string
x = "applebob & john & smith" #after replacing string

This is the code I am using now.

import re

and_regex = re.compile(r'([^ ]+(&)[^ ])')
x = "apple&bob & john & smith"
x = re.sub(and_regex, " ",x)
print(x)

I cannot use the string replace (string.replace) because it will replace the "&"s in the entire string.

Thanks for the help!

Answer 1

you can do this:

import re
x = "apple&bob & john & smith"
x = re.sub("(?<=\S)&(?=\S)", "",x)
print(x)

output:

applebob & john & smith

Answer 2

As al alternative, if you also want to remove the & char at the start and end in for example &apple&bob & john & smith& you can either assert a non whitespace char to the left OR assert a non whitespace char to the right.

(?<=\S)&|&(?=\S)

Regex demo

import re

strings = [
    "apple&bob & john & smith",
    "&apple&bob & john & smith&",
    "&apple&bob & john & smith&&"
]

for s in strings:
    print(re.sub(r"(?<=\S)&|&(?=\S)", "", s))

Output

applebob & john & smith
applebob & john & smith
applebob & john & smith

Answer 3

You can capture those parts you want to keep. And when replacing with .sub() method, enter the captures parts using \\1 and \\2 in the replacer string.

import re
pattern = re.compile(r'(\S+)&(\S+)')
# `\S` means: any non-white character.
# see: https://docs.python.org/3/library/re.html

x = "apple&bob & john & smith"
x = pattern.sub("\\1\\2", x) # or also: re.sub(pattern, "\\1\\2", x)

x
## 'applebob & john & smith'

However, this replaces only 1 occurrence, the leftmost non-overlapping one, we need a function to replace all occurrences in the string. One can solve it using recursion:

def replace_all_pattern(pattern, s):
    if bool(re.match(pattern, s)):
        res = re.sub(pattern, "\\1\\2", s)
        return replace_all_pattern(pattern, res)
    else:
        return s


replace_all_pattern(r"(\S+)&(\S+)", "abble&bob&john&smith")
## 'abblebobjohnsmith'

But this will be performance-wise less efficient than using look-arounds. So use this only if exactly one occurrence is to be replaced. In that case, preformance-wise, it is better than the look-arounds, but as soon as more than one occurrences are possible and have to be checked: use the look-arounds as pattern, because they will be more efficient.

How to use regex to replace a specific group in a string using Python?

Question

3 answers

solution1
3 2021-04-13 07:29:18

solution2
2 2021-04-13 07:53:14

solution3
1 2021-04-13 07:28:02

How to use regex to replace a specific group in a string using Python?

Question

3 answers

solution1 3 2021-04-13 07:29:18

solution2 2 2021-04-13 07:53:14

solution3 1 2021-04-13 07:28:02

solution1
3 2021-04-13 07:29:18

solution2
2 2021-04-13 07:53:14

solution3
1 2021-04-13 07:28:02