正则表达式匹配非字母数字字符

Question

I'm using Python to parse some strings in a list. 我正在使用Python来解析列表中的一些字符串。 Some of the strings may only contain non-alphanumeric characters which I'd like to ignore, like this: 有些字符串可能只包含我想忽略的非字母数字字符，如下所示：

list = ['()', 'desk', 'apple', ':desk', '(house', ')', '(:', ')(', '(', ':(', '))']

for item in list:
    if re.search(r'\W+', item):
        list.remove(item)

# Ideal output
list = ['desk', 'apple', ':desk', '(house']

# Actual output
list = ['desk', 'apple', '(:', '(', '))']

That's my first attempt at the regex for this problem, but it's not really having the desired effect. 这是我对这个问题的正则表达式的第一次尝试，但它并没有真正达到预期的效果。 How would I write a regex to ignore any strings with non-alphanumeric characters? 如何编写正则表达式来忽略任何带有非字母数字字符的字符串？

Answer 1

BTW your Regex seems to match non-alphanumeric characters. 顺便说一句，你的正则表达式似乎与非字母数字字符匹配。 However it isn't advisable to remove items from a list your currently iterating over and that's the cause of this error therefore to overcome this create a new list and append to it the elements which don't match. 但是，建议不要从当前迭代的列表中删除项目，这是导致此错误的原因，因此要克服此错误，请创建一个新列表并将不匹配的元素附加到该列表中。

在此输入图像描述

Demo: 演示：

import re

list = ['()', 'desk', 'apple', ':desk', '(house', ')', '(:', ')(', '(', ':(', '))']
new_list = []

for item in list:
    if not re.search(r'^\W+$', item) or re.search(r'^\w+', item) :
        new_list.append(item)

print new_list

Produces: 生产：

['desk', 'apple', ':desk', '(house']

As far as I tested this works in nearly all scenarios. 据我测试，这几乎适用于所有场景。

Answer 2

What about a list comprehension with re.match(pattern, string) : 如何使用re.match(pattern, string)进行列表理解：

import re

items = ['()', 'desk', 'apple', ')', '(:', ')(', '(', ':(', '))']
cleaned_items = [item for item in items if re.match('\W?\w+', item)]
print cleaned_items

This prints 这打印

['desk', 'apple', ':desk', '(house']

Answer 3

The problem is not with your regex. 问题不在于你的正则表达式。 You are iterating over a list which you are then modifying, which causes weirdness (see Modifying list while iterating ). 您正在迭代您正在修改的列表，这会导致奇怪（请参阅迭代时修改列表）。 You can use a list comprehension like Jon posted, or you can iterate over a copy of the list: for item in list[:]: 您可以使用像Jon发布的列表推导，或者您可以迭代列表的副本： for item in list[:]:

正则表达式匹配非字母数字字符

问题描述

3 个解决方案

解决方案1
6 2013-12-10 16:42:08

解决方案2
2 2013-12-10 17:01:08

解决方案3
0 2013-12-10 16:48:14

正则表达式匹配非字母数字字符

问题描述

3 个解决方案

解决方案1 6 2013-12-10 16:42:08

解决方案2 2 2013-12-10 17:01:08

解决方案3 0 2013-12-10 16:48:14

解决方案1
6 2013-12-10 16:42:08

解决方案2
2 2013-12-10 17:01:08

解决方案3
0 2013-12-10 16:48:14