简体   繁体   English

正则表达式匹配非字母数字字符

[英]Regex matching non-alphanumeric characters

I'm using Python to parse some strings in a list. 我正在使用Python来解析列表中的一些字符串。 Some of the strings may only contain non-alphanumeric characters which I'd like to ignore, like this: 有些字符串可能只包含我想忽略的非字母数字字符,如下所示:

list = ['()', 'desk', 'apple', ':desk', '(house', ')', '(:', ')(', '(', ':(', '))']

for item in list:
    if re.search(r'\W+', item):
        list.remove(item)

# Ideal output
list = ['desk', 'apple', ':desk', '(house']

# Actual output
list = ['desk', 'apple', '(:', '(', '))']

That's my first attempt at the regex for this problem, but it's not really having the desired effect. 这是我对这个问题的正则表达式的第一次尝试,但它并没有真正达到预期的效果。 How would I write a regex to ignore any strings with non-alphanumeric characters? 如何编写正则表达式来忽略任何带有非字母数字字符的字符串?

BTW your Regex seems to match non-alphanumeric characters. 顺便说一句,你的正则表达式似乎与非字母数字字符匹配。 However it isn't advisable to remove items from a list your currently iterating over and that's the cause of this error therefore to overcome this create a new list and append to it the elements which don't match. 但是, 建议不要从当前迭代的列表中删除项目,这是导致此错误的原因, 因此要克服此错误,请创建一个新列表并将不匹配的元素附加到该列表中。

在此输入图像描述

Demo: 演示:

import re

list = ['()', 'desk', 'apple', ':desk', '(house', ')', '(:', ')(', '(', ':(', '))']
new_list = []

for item in list:
    if not re.search(r'^\W+$', item) or re.search(r'^\w+', item) :
        new_list.append(item)

print new_list

Produces: 生产:

['desk', 'apple', ':desk', '(house']

As far as I tested this works in nearly all scenarios. 据我测试,这几乎适用于所有场景。

What about a list comprehension with re.match(pattern, string) : 如何使用re.match(pattern, string)进行列表理解:

import re

items = ['()', 'desk', 'apple', ')', '(:', ')(', '(', ':(', '))']
cleaned_items = [item for item in items if re.match('\W?\w+', item)]
print cleaned_items

This prints 这打印

['desk', 'apple', ':desk', '(house']

The problem is not with your regex. 问题不在于你的正则表达式。 You are iterating over a list which you are then modifying, which causes weirdness (see Modifying list while iterating ). 您正在迭代您正在修改的列表,这会导致奇怪(请参阅迭代时修改列表 )。 You can use a list comprehension like Jon posted, or you can iterate over a copy of the list: for item in list[:]: 您可以使用像Jon发布的列表推导,或者您可以迭代列表的副本: for item in list[:]:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过正则表达式替换删除非字母数字字符 - Remove non-alphanumeric characters by regex substitution 使用Python替换正则表达式中的非字母数字字符 - Replacing non-alphanumeric characters in regex match using Python Python 正则表达式 - 用破折号替换非字母数字字符和空格 - Python Regex - Replacing Non-Alphanumeric Characters AND Spaces with Dash 使用非字母数字字符过滤掉行 - Filtering out rows with non-alphanumeric characters 使用bash或python删除非字母数字字符 - Removing non-alphanumeric characters with bash or python 替换字符串中的所有非字母数字字符 - Replace all non-alphanumeric characters in a string 在 Python 中使用 RegEx 替换除一种特定模式之外的所有非字母数字字符 - Replace all Non-Alphanumeric Characters except one particular pattern using RegEx in Python 在之前的任何位置查找具有字母数字字符的非字母数字字符 - Find non-alphanumeric characters with alphanumeric character anywhere before 如何使用正则表达式从字符串中删除所有非字母数字字符(“#”除外)? - How can I remove all non-alphanumeric characters from a string, except for '#', with regex? 如何使用正则表达式删除 python 中某个字符串的前导和尾随非字母数字字符? - How to remove leading and trailing non-alphanumeric characters of a certain string in python using regex?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM