简体   繁体   中英

Python regex with number of occurences

Hi I'm looking for a regular expression that would allow me not only to replace characters but also to annotate the occurrence number.

For example I would like to replace all special characters with "s", all letters with "c" and all number with "d" and annotate their occurrence between "{}".

If I have "123-45AB-78,£". I would like to get d{3}s{1}d{3}c{2}s{1}d{2}s{2}

Is there a way to do that with regex?

Many thanks

Here is one approach using re.sub with a callback function:

import re

def repl(m):
    c = m.group()
    if re.search(r'^[A-Za-z]+$', c):
        return 'c{' + str(len(c)) + '}'
    elif re.search(r'^\d+$', c):
        return 'd{' + str(len(c)) + '}'
    else:
        return 's{' + str(len(c)) + '}'

x = "123-45AB-78!£"
print(re.sub('[A-Za-z]+|\d+|\D+', repl, x))

# d{3}s{1}d{2}c{2}s{1}d{2}s{3}

Here is a method that first replaces each character by its type-character, then counts them with itertools.groupby . I'm not sure it is any faster than the good answer given by Tim , but it should be comparable.

x = "123-45AB-78!£"
x = re.sub(r"[A-Za-z]", "c", x)
x = re.sub(r"\d", "d", x)
x = re.sub(r"[^\d\w]", "s", x)
print(x)  # dddsddccsddss

y = "".join([f"{k}{{{len(list(g))}}}" for k, g in groupby(x)])
print(y)  # d{3}s{1}d{2}c{2}s{1}d{2}s{2}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM