简体   繁体   中英

Python Regular Expression Escape or not

I need to write a regular expression to get all the characters in the list below.. (remove all the characters not in the list)

allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

I don't know how to do it, should I even use re.match or re.findall or re.sub...?

Thanks a lot in advance.

Don't use regular expressions at all, first convert allow_characters to a set and then use ''.join() with a generator expression that strips out the unwanted characters. Assuming the string you are transforming is called s :

allow_char_set = set(allow_characters)
s = ''.join(c for c in s if c in allow_char_set)

That being said, here is how this might look with regex:

s = re.sub(r'[^#.\-_a-zA-Z0-9]+', '', s)

You could convert your allow_characters string into this regex, but I think the first solution is significantly more straightforward.

Edit: As pointed out by DSM in comments, str.translate() is often a very good way to do something like this. In this case it is slightly complicated but you can still use it like this:

import string

allow_characters = "#.-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
all_characters = string.maketrans('', '')
delete_characters = all_characters.translate(None, allow_characters)

s = s.translate(None, delete_characters)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM