用于匹配各种形式的字符串的正则表达式

Question

假设输入字符串是

s_in = 'auto encoder'

和字符串列表是

l_s = ['autoencoder', 'auto-encoder', 'auto', 'one']

我的目标是将 s_in 与其在 l_s 中的可能形式相匹配，以便作为回报从列表中获取所有匹配的字符串。

在上面的例子中，输出必须是 ['autoencoder', 'auto-encoder']

另一个例子：

s_in = 'autoencoder'    
l_s = ['auto-encoder', 'auto encoder', 'auto', 'one']

输出：['自动编码器'，'自动编码器']

或者

s_in = 'auto-encoder'    
l_s = ['autoencoder', 'auto encoder', 'auto', 'one']

输出：['自动编码器'，'自动编码器']

我构建的正则表达式如下所示：

re.match(r'^[a-zA-Z]+(?:(?:\s[a-zA-Z]+)+|(?:\-[a-zA-Z]+)|(?:[a-zA-Z]+))$', s)

如果我只是迭代列表项，它工作得很好，但如果我尝试组合输入字符串和字符串列表，它就不起作用。

Answer 1

您可以在删除所有特殊字符后比较字符串，例如使用[\\W_]+模式：

import re
s_in = 'auto encoder'
l_s = ['autoencoder', 'auto-encoder', 'auto', 'one']

rx = re.compile(r'[\W_]+')  # Define the regex for non-alnum chars
s_check = rx.sub('', s_in)  # Input string without non-alnum chars
print( [x for x in l_s if s_check == rx.sub('', x)] ) # Print if equal after removing all non-alnum chars
# => ['autoencoder', 'auto-encoder']

请参阅Python 演示。

用于匹配各种形式的字符串的正则表达式

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-11-04 09:44:56

用于匹配各种形式的字符串的正则表达式

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-11-04 09:44:56

解决方案1
2 已采纳 2020-11-04 09:44:56