如何使用正则表达式或替换来清理列表？

Question

This seems like a very obvious mistake which I have been trying to solve for almost an hour now.这似乎是一个非常明显的错误，我已经尝试解决了将近一个小时。 :( :(

lst = ['\xa0\xa0+11-9188882266\xa0\xa0+01-9736475634 ','\xa0\xa0+11-9177772266\xa0\xa0+01-9736475234']

I am trying to grab numbers, hyphens and the + sign only.我正在尝试仅获取数字、连字符和 + 号。 Basically remove all the \\xa0 .基本上删除所有\\xa0 。

I thought that Regex would be the right way to go about it.我认为Regex将是解决它的正确方法。 Tried it and failed:试过了，失败了：

mRegex = (['+0-9-'])
lst = re.match(mRegex,lst)

Traceback (most recent call last): File "", line 1, in File "C:\\Python34\\lib\\re.py", line 160, in match return _compile(pattern, flags).match(string) File "C:\\Python34\\lib\\re.py", line 282, in _compile p, loc = _cache[type(pattern), pattern, flags] TypeError: unhashable type: 'list'回溯（最近一次调用最后一次）：文件“”，第 1 行，在文件“C:\\Python34\\lib\\re.py”中，第 160 行，在匹配中 return _compile(pattern, flags).match(string) File "C :\\Python34\\lib\\re.py", line 282, in _compile p, loc = _cache[type(pattern), pattern, flags] TypeError: unhashable type: 'list'

I gave it a few more tries with regex then switched to replace :我用regex了几次，然后切换到replace ：

h.replace(r"\\xa0","")

It doesn't do anything to the lst .它对lst没有任何作用。 Stays exactly the same.保持完全相同。

When I do a len(lst[0]) I get 33 which is very odd.当我执行len(lst[0])我得到33 ，这很奇怪。

In a:在一个：

for i in lst[0]:
    print(i)

the output doesn't show \\xa0 .输出不显示\\xa0 。

I am completely confused here.我在这里完全困惑。

Answer 1

first, you cannot apply replacement/regex on a list.首先，您不能在列表上应用替换/正则表达式。 You have to apply them for each string, and use a list comprehension to rebuild the cleaned-up list.您必须为每个字符串应用它们，并使用列表理解来重建清理后的列表。

second, when you replace you're using the raw prefix, when you shouldn't use it, since it treats \\x literally, not that you want.其次，当您替换时，您使用的是原始前缀，而您不应该使用它，因为它按字面意思处理\\x ，而不是您想要的。

I'd do:我会做：

lst = [x.replace("\xa0","") for x in lst]

results in:结果是：

['+11-9188882266+01-9736475634 ', '+11-9177772266+01-9736475234']

and BTW: mRegex = (['+0-9-']) doesn't work because you're basically defining a list of 1 string.顺便说一句： mRegex = (['+0-9-'])不起作用，因为您基本上定义了一个包含 1 个字符串的列表。 You probably meant mRegex = '([0-9\\-+])'你可能的意思是mRegex = '([0-9\\-+])'

A regex solution would be:正则表达式解决方案是：

lst = [re.sub(r"[^\d+\-]","",x) for x in lst]

(removes chars not matching the char class, and \\d is (roughly) equivalent to 0-9 ) （删除与 char 类不匹配的字符，并且\\d （大致）相当于0-9 ）

After a few years I realize (after reading OP comment properly this time) that the expected result is probably the numbers separated in a list, so removing \\xa0 isn't a good idea, because it collates the numbers.几年后我意识到（这次正确阅读 OP 评论后）预期的结果可能是列表中分隔的数字，因此删除\\xa0不是一个好主意，因为它整理了数字。 Let's just use split on each string:让我们在每个字符串上使用split ：

>>> lst = ['\xa0\xa0+11-9188882266\xa0\xa0+01-9736475634 ','\xa0\xa0+11-9177772266\xa0\xa0+01-9736475234']
>>> [x.split() for x in lst]
[['+11-9188882266', '+01-9736475634'], ['+11-9177772266', '+01-9736475234']]

Actually using split() works because \\xa0 is seen as a space character (windows uses it for instance), and also removes multiple instances of spaces, so the result is given straight away without further hassle.实际上使用split()有效的，因为\\xa0被视为一个空格字符（例如，windows 使用它），并且还删除了多个空格实例，因此结果可以直接给出而不会再麻烦。

如何使用正则表达式或替换来清理列表？

问题描述

1 个解决方案

解决方案1
7 已采纳 2017-01-24 18:58:00

如何使用正则表达式或替换来清理列表？

问题描述

1 个解决方案

解决方案1 7 已采纳 2017-01-24 18:58:00

解决方案1
7 已采纳 2017-01-24 18:58:00