Hi I'm looking for a regular expression that would allow me not only to replace characters but also to annotate the occurrence number.
For example I would like to replace all special characters with "s", all letters with "c" and all number with "d" and annotate their occurrence between "{}".
If I have "123-45AB-78,£". I would like to get d{3}s{1}d{3}c{2}s{1}d{2}s{2}
Is there a way to do that with regex?
Many thanks
Here is one approach using re.sub
with a callback function:
import re
def repl(m):
c = m.group()
if re.search(r'^[A-Za-z]+$', c):
return 'c{' + str(len(c)) + '}'
elif re.search(r'^\d+$', c):
return 'd{' + str(len(c)) + '}'
else:
return 's{' + str(len(c)) + '}'
x = "123-45AB-78!£"
print(re.sub('[A-Za-z]+|\d+|\D+', repl, x))
# d{3}s{1}d{2}c{2}s{1}d{2}s{3}
Here is a method that first replaces each character by its type-character, then counts them with itertools.groupby . I'm not sure it is any faster than the good answer given by Tim , but it should be comparable.
x = "123-45AB-78!£"
x = re.sub(r"[A-Za-z]", "c", x)
x = re.sub(r"\d", "d", x)
x = re.sub(r"[^\d\w]", "s", x)
print(x) # dddsddccsddss
y = "".join([f"{k}{{{len(list(g))}}}" for k, g in groupby(x)])
print(y) # d{3}s{1}d{2}c{2}s{1}d{2}s{2}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.