简体   繁体   中英

Regular Expression - Remove all special characters except alphanumeric and accents

How is possible to remove all special characters except alphanumeric and accents?

I tried something like:

text = 'abcdeáéí.@# '
re.sub(r'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)

But I hadn't success. the following expression is valid to allow just alphanumeric but not to accents:

tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)

Could someone help me?

Make your text a unicode string text = u'abcdeáéí.@# ' and make sure your pattern is able to accept unicode characters as well re.sub(u'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)

With this combination, I get u'abcde\\xe1\\xe9\\xed ' as a result (where \\xe1 etc. are escape codes for the accent characters in text

There's no need for r in front of the pattern if you aren't escaping any characters. It's there so you can write things like r'\\d\\w' instead of '\\\\d\\\\w'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM