I want to read in a string and delete the captured group (in this case "[^ ]+(&)[^ ]).
x = "apple&bob & john & smith" # original string
x = "applebob & john & smith" #after replacing string
This is the code I am using now.
import re
and_regex = re.compile(r'([^ ]+(&)[^ ])')
x = "apple&bob & john & smith"
x = re.sub(and_regex, " ",x)
print(x)
I cannot use the string replace (string.replace) because it will replace the "&"s in the entire string.
Thanks for the help!
you can do this:
import re
x = "apple&bob & john & smith"
x = re.sub("(?<=\S)&(?=\S)", "",x)
print(x)
output:
applebob & john & smith
As al alternative, if you also want to remove the &
char at the start and end in for example &apple&bob & john & smith&
you can either assert a non whitespace char to the left OR assert a non whitespace char to the right.
(?<=\S)&|&(?=\S)
import re
strings = [
"apple&bob & john & smith",
"&apple&bob & john & smith&",
"&apple&bob & john & smith&&"
]
for s in strings:
print(re.sub(r"(?<=\S)&|&(?=\S)", "", s))
Output
applebob & john & smith
applebob & john & smith
applebob & john & smith
You can capture those parts you want to keep. And when replacing with .sub()
method, enter the captures parts using \\1
and \\2
in the replacer string.
import re
pattern = re.compile(r'(\S+)&(\S+)')
# `\S` means: any non-white character.
# see: https://docs.python.org/3/library/re.html
x = "apple&bob & john & smith"
x = pattern.sub("\\1\\2", x) # or also: re.sub(pattern, "\\1\\2", x)
x
## 'applebob & john & smith'
However, this replaces only 1 occurrence, the leftmost non-overlapping one, we need a function to replace all occurrences in the string. One can solve it using recursion:
def replace_all_pattern(pattern, s):
if bool(re.match(pattern, s)):
res = re.sub(pattern, "\\1\\2", s)
return replace_all_pattern(pattern, res)
else:
return s
replace_all_pattern(r"(\S+)&(\S+)", "abble&bob&john&smith")
## 'abblebobjohnsmith'
But this will be performance-wise less efficient than using look-arounds. So use this only if exactly one occurrence is to be replaced. In that case, preformance-wise, it is better than the look-arounds, but as soon as more than one occurrences are possible and have to be checked: use the look-arounds as pattern, because they will be more efficient.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.