[英]How to replace non-alphabetic AND numeric characters in a string in python
I understand that to replace non-alphanumeric characters in a string a code would be as follows: 我理解,要替换字符串中的非字母数字字符,代码如下:
words = re.sub("[^\w]", " ", str).split()
However, ^\\w
replaces non-alphanumeric characters. 但是, ^\\w
替换非字母数字字符。 I want to replace both non-alphabetic and numeric chars in a string like: 我想在字符串中替换非字母和数字字符,如:
"baa!!!!! baa sheep23? baa baa"
and I want it to have an outcome like this: 我希望它有这样的结果:
"baa baa sheep baa baa"
If I do words = re.sub("[^\\w\\d]", " ", str).split()
, I get an outcome with numeric characters, like 'sheep23'
. 如果我做words = re.sub("[^\\w\\d]", " ", str).split()
,我得到一个数字字符的结果,如'sheep23'
。 I guess that it could be because "^"
affects \\d
as well and it counts as if I want the non-numeric characters removed. 我想这可能是因为"^"
也会影响\\d
,并且它就好像我想要删除非数字字符一样。 How do I do this right? 我该怎么做?
Use str.translate
: 使用str.translate
:
>>> from string import punctuation, digits
>>> s = "baa!!!!! baa sheep23? baa baa"
>>> s.translate(None, punctuation+digits)
'baa baa sheep baa baa'
No need to do regex here, just a simple comprehension will work: 这里不需要做正则表达式,只需简单的理解即可:
>>> import string
>>> word = "baa!!!!! baa sheep23? baa baa"
>>> "".join([l for l in word if l in string.ascii_letters+string.whitespace])
'baa baa sheep baa baa'
Try this regex: 试试这个正则表达式:
[^a-zA-Z]
This matches anything that is not a letter. 这匹配任何不是字母的东西。
Or this if you want to keep spaces: 或者如果你想保留空格:
[^a-zA-Z\\s] [^ A-ZA-Z \\ s]的
What about this regex? 那个正则表达式怎么样?
[^\w]|\d
EDIT: 编辑:
As @Avinash said this not removes _
. 正如@Avinash所说,这不会删除_
。 If you want to remove also _
use: 如果你想要删除_
使用:
[^\w]|[\d_]
and if you also want to replace multiple spaces with a single one use: 如果您还想用一个替换多个空格,请使用:
([^\w]|[\d_])+
Here's your example with an addition of underscores: 这是添加下划线的示例:
In [1]: import re
In [2]: s = "baa!!!!! baa sheep23? baa baa___"
In [3]: re.sub("([^\w]|[\d_])+", " ", s)
Out[3]: 'baa baa sheep baa baa '
In [4]: re.sub("([^\w]|[\d_])+", " ", s).split()
Out[4]: ['baa', 'baa', 'sheep', 'baa', 'baa']
Through re.sub
function, 通过re.sub
函数,
>>> s = "baa!!!!! baa sheep23? baa baa"
>>> m = re.sub(r'[^A-Za-z ]', "", s)
>>> m
'baa baa sheep baa baa'
Instead of replacing every non-letter with a space then split you can do it all in one go: 而不是用空格替换每个非字母然后拆分你可以一次完成所有操作:
>>> re.split("[^a-zA-Z]+", "baa!!!!! baa sheep23? baa baa")
['baa', 'baa', 'sheep', 'baa', 'baa']
[^\\w]
is equivalent to [^a-zA-Z0-9_]
(modulo language settings), you need to keep in your character class only what you want - and [^a-zA-Z]
obviously includes spaces. [^\\w]
相当于[^a-zA-Z0-9_]
(模数语言设置),你需要只在你的角色类中保留你想要的东西 - 而[^a-zA-Z]
显然包含空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.