如何在python中替换字符串中的非字母和数字字符

Question

I understand that to replace non-alphanumeric characters in a string a code would be as follows: 我理解，要替换字符串中的非字母数字字符，代码如下：

words = re.sub("[^\w]", " ",  str).split()

However, ^\\w replaces non-alphanumeric characters. 但是， ^\\w替换非字母数字字符。 I want to replace both non-alphabetic and numeric chars in a string like: 我想在字符串中替换非字母和数字字符，如：

"baa!!!!! baa sheep23? baa baa"

and I want it to have an outcome like this: 我希望它有这样的结果：

 "baa baa sheep baa baa"

If I do words = re.sub("[^\\w\\d]", " ", str).split() , I get an outcome with numeric characters, like 'sheep23' . 如果我做words = re.sub("[^\\w\\d]", " ", str).split() ，我得到一个数字字符的结果，如'sheep23' 。 I guess that it could be because "^" affects \\d as well and it counts as if I want the non-numeric characters removed. 我想这可能是因为"^"也会影响\\d ，并且它就好像我想要删除非数字字符一样。 How do I do this right? 我该怎么做？

Answer 1

Use str.translate : 使用str.translate ：

>>> from string import punctuation, digits
>>> s = "baa!!!!! baa sheep23? baa baa"
>>> s.translate(None, punctuation+digits)
'baa baa sheep baa baa'

Answer 2

No need to do regex here, just a simple comprehension will work: 这里不需要做正则表达式，只需简单的理解即可：

>>> import string
>>> word = "baa!!!!! baa sheep23? baa baa"
>>> "".join([l for l in word if l in string.ascii_letters+string.whitespace])
'baa baa sheep baa baa'

Answer 3

Try this regex: 试试这个正则表达式：

[^a-zA-Z]

This matches anything that is not a letter. 这匹配任何不是字母的东西。

Or this if you want to keep spaces: 或者如果你想保留空格：

[^a-zA-Z\\s] [^ A-ZA-Z \\ s]的

Answer 4

What about this regex? 那个正则表达式怎么样？

[^\w]|\d

EDIT: 编辑：

As @Avinash said this not removes _ . 正如@Avinash所说，这不会删除_ 。 If you want to remove also _ use: 如果你想要删除_使用：

[^\w]|[\d_]

and if you also want to replace multiple spaces with a single one use: 如果您还想用一个替换多个空格，请使用：

([^\w]|[\d_])+

Here's your example with an addition of underscores: 这是添加下划线的示例：

In [1]: import re

In [2]: s = "baa!!!!! baa sheep23? baa baa___"

In [3]: re.sub("([^\w]|[\d_])+", " ",  s)
Out[3]: 'baa baa sheep baa baa '

In [4]: re.sub("([^\w]|[\d_])+", " ",  s).split()
Out[4]: ['baa', 'baa', 'sheep', 'baa', 'baa']

Answer 5

Through re.sub function, 通过re.sub函数，

>>> s = "baa!!!!! baa sheep23? baa baa"
>>> m = re.sub(r'[^A-Za-z ]', "", s)
>>> m
'baa baa sheep baa baa'

Answer 6

Instead of replacing every non-letter with a space then split you can do it all in one go: 而不是用空格替换每个非字母然后拆分你可以一次完成所有操作：

>>> re.split("[^a-zA-Z]+", "baa!!!!! baa sheep23? baa baa")
['baa', 'baa', 'sheep', 'baa', 'baa']

[^\\w] is equivalent to [^a-zA-Z0-9_] (modulo language settings), you need to keep in your character class only what you want - and [^a-zA-Z] obviously includes spaces. [^\\w]相当于[^a-zA-Z0-9_] （模数语言设置），你需要只在你的角色类中保留你想要的东西 - 而[^a-zA-Z]显然包含空格。

如何在python中替换字符串中的非字母和数字字符

问题描述

6 个解决方案

解决方案1
8 2014-08-05 13:55:15

解决方案2
3 2014-08-05 13:55:02

解决方案3
2 2014-08-05 13:54:12

解决方案4
1 已采纳 2014-08-05 13:58:27

解决方案5
0 2014-08-05 14:08:53

解决方案6
0 2014-08-05 14:09:42

如何在python中替换字符串中的非字母和数字字符

问题描述

6 个解决方案

解决方案1 8 2014-08-05 13:55:15

解决方案2 3 2014-08-05 13:55:02

解决方案3 2 2014-08-05 13:54:12

解决方案4 1 已采纳 2014-08-05 13:58:27

解决方案5 0 2014-08-05 14:08:53

解决方案6 0 2014-08-05 14:09:42

解决方案1
8 2014-08-05 13:55:15

解决方案2
3 2014-08-05 13:55:02

解决方案3
2 2014-08-05 13:54:12

解决方案4
1 已采纳 2014-08-05 13:58:27

解决方案5
0 2014-08-05 14:08:53

解决方案6
0 2014-08-05 14:09:42