你如何在Python中使用列表解析中的正则表达式？

Question

我正在尝试在单词列表中找到字符串的所有索引位置，并且我希望将值作为列表返回。 我想找到字符串，如果它是独立的，或者如果它在标点之前或后面，但是如果它是一个更大的单词的子字符串则不是。

以下代码仅捕获“cow”，并且错过了“test; cow”和“cow”。

myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == myString]
print indices
>> 5

我试过更改代码以使用正则表达式：

import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if x == re.match('\W*myString\W*', myList)]
print indices

但这会产生错误：预期的字符串或缓冲区

如果有人知道我做错了什么，我会很高兴听到。 我有一种感觉，这与我正在尝试在那里使用正则表达式的事实有关，当它期待一个字符串时。 有解决方案吗？

我正在寻找的输出应该是：

>> [0, 4, 5]

谢谢

Answer 1

您不需要将match结果分配回x 。 你的比赛应该是x而不是list 。

此外，您需要使用re.search而不是re.match ，因为正则表达式模式'\\W*myString\\W*'与第一个元素不匹配。 那是因为test; 与\\W*不匹配。 实际上，您只需要测试即时跟随和前一个字符，而不是完整的字符串。

所以，你可以在字符串周围使用word boundaries ：

pattern = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(pattern, x)]

Answer 2

您的代码存在一些问题。 首先，您需要将expr与列表元素（ x ）匹配，而不是与整个列表（ myList ） myList 。 其次，为了在表达式中插入变量，必须使用+ （字符串连接）。 最后，使用原始文字（ r'\\W ）在expr中正确地插入斜杠：

import re
myList = ['test;cow', 'one', 'two', 'three', 'cow.', 'cow', 'acow']
myString = 'cow'
indices = [i for i, x in enumerate(myList) if re.match(r'\W*' + myString + r'\W*', x)]
print indices

如果myString有可能包含特殊的正则表达式字符（如斜线或点），您还需要将re.escape应用于它：

regex = r'\W*' + re.escape(myString) + r'\W*'
indices = [i for i, x in enumerate(myList) if re.match(regex, x)]

正如评论中指出的，以下可能是更好的选择：

regex = r'\b' + re.escape(myString) + r'\b'
indices = [i for i, x in enumerate(myList) if re.search(regex, x)]

你如何在Python中使用列表解析中的正则表达式？

问题描述

2 个解决方案

解决方案1
17 已采纳 2013-02-11 19:13:55

解决方案2
4 2013-02-11 19:15:52

你如何在Python中使用列表解析中的正则表达式？

问题描述

2 个解决方案

解决方案1 17 已采纳 2013-02-11 19:13:55

解决方案2 4 2013-02-11 19:15:52

解决方案1
17 已采纳 2013-02-11 19:13:55

解决方案2
4 2013-02-11 19:15:52