Let's say the input string is
s_in = 'auto encoder'
and the list of strings is
l_s = ['autoencoder', 'auto-encoder', 'auto', 'one']
My goal is to match s_in with its possible forms in l_s so that in return ill get all matched strings from the list.
In the example above the output must be ['autoencoder', 'auto-encoder']
Another example:
s_in = 'autoencoder'
l_s = ['auto-encoder', 'auto encoder', 'auto', 'one']
Output: ['auto-encoder', 'auto encoder']
Or
s_in = 'auto-encoder'
l_s = ['autoencoder', 'auto encoder', 'auto', 'one']
Output: ['autoencoder', 'auto encoder']
The regex I constructed looks like this:
re.match(r'^[a-zA-Z]+(?:(?:\s[a-zA-Z]+)+|(?:\-[a-zA-Z]+)|(?:[a-zA-Z]+))$', s)
It works well if I just iterate over list items, but doesn't work if I try to combine input string and list of strings.
You can compare the strings after removing all special characters, say, with [\\W_]+
pattern:
import re
s_in = 'auto encoder'
l_s = ['autoencoder', 'auto-encoder', 'auto', 'one']
rx = re.compile(r'[\W_]+') # Define the regex for non-alnum chars
s_check = rx.sub('', s_in) # Input string without non-alnum chars
print( [x for x in l_s if s_check == rx.sub('', x)] ) # Print if equal after removing all non-alnum chars
# => ['autoencoder', 'auto-encoder']
See the Python demo .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.