How is possible to remove all special characters except alphanumeric and accents?
I tried something like:
text = 'abcdeáéí.@# '
re.sub(r'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)
But I hadn't success. the following expression is valid to allow just alphanumeric but not to accents:
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
Could someone help me?
Make your text a unicode string text = u'abcdeáéí.@# '
and make sure your pattern is able to accept unicode characters as well re.sub(u'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)
With this combination, I get u'abcde\\xe1\\xe9\\xed '
as a result (where \\xe1
etc. are escape codes for the accent characters in text
There's no need for r
in front of the pattern if you aren't escaping any characters. It's there so you can write things like r'\\d\\w'
instead of '\\\\d\\\\w'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.