删除 Python 中的数字（正则表达式）

Question

I'm trying to delete all digits from a string.我正在尝试从字符串中删除所有数字。 However the next code deletes as well digits contained in any word, and obviously I don't want that.但是，下一个代码也会删除任何单词中包含的数字，显然我不希望这样。 I've been trying many regular expressions with no success.我一直在尝试许多正则表达式，但没有成功。

Thanks谢谢

s = "This must not b3 delet3d, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

Result:结果：

This must not b deletd, but the number at the end yes这个一定不能b删，但是最后的数字yes

Answer 1

Add a space before the \d+.在 \d+ 之前添加一个空格。

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

Edit: After looking at the comments, I decided to form a more complete answer.编辑：查看评论后，我决定形成一个更完整的答案。 I think this accounts for all the cases.我认为这说明了所有情况。

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

Answer 2

Try this:尝试这个：

"\b\d+\b"

That'll match only those digits that are not part of another word.这将只匹配那些不属于另一个单词的数字。

Answer 3

Using \s isn't very good, since it doesn't handle tabs, et al.使用\s不是很好，因为它不处理制表符等。 A first cut at a better solution is:更好的解决方案的第一步是：

re.sub(r"\b\d+\b", "", s)

Note that the pattern is a raw string because \b is normally the backspace escape for strings, and we want the special word boundary regex escape instead.请注意，该模式是原始字符串，因为\b通常是字符串的退格转义符，而我们希望使用特殊的单词边界正则表达式转义符。 A slightly fancier version is:一个稍微花哨的版本是：

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

That tries to remove leading/trailing whitespace when there are digits at the beginning/end of the string.当字符串的开头/结尾有数字时，它会尝试删除前导/尾随空格。 I say "tries" because if there are multiple numbers at the end then you still have some spaces.我说“尝试”是因为如果最后有多个数字，那么你仍然有一些空格。

Answer 4

To handle digit strings at the beginning of a line as well:还要处理行首的数字字符串：

s = re.sub(r"(^|\W)\d+", "", s)

Answer 5

You could try this你可以试试这个

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s)

result:结果：

'This must not b3 delet3d, but the number at the end yes'

the same rule also applies to同样的规则也适用于

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s)

result:结果：

'This must not b3 delet3d, but the number at the end yes'

Answer 6

To match only pure integers in a string:要仅匹配字符串中的纯整数：

\b(?<![0-9-])(\d+)(?![0-9-])\b

It does the right thing with this, matching only everything after million:它对此做了正确的事情，仅匹配百万之后的所有内容：

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

All of the other 8 regex answers on this page fail in various ways with that input.此页面上的所有其他 8 个正则表达式答案都因该输入而以各种方式失败。

The dash at the end by that first 0-9... [0-9-]... preserves -007 and the dash in the second set preserves 8-.第一个 0-9...[0-9-]... 末尾的破折号保留 -007，第二组中的破折号保留 8-。

Or \d in place of 0-9 if you prefer或 \d 代替 0-9 如果您愿意

at regex101在正则表达式101

Can it be simplified?可以简化吗？

Answer 7

If your number is allways at the end of your strings try:如果您的号码始终位于字符串的末尾，请尝试：

re.sub("\d+$", "", s)

otherwise, you may try否则，您可以尝试

re.sub("(\s)\d+(\s)", "\1\2", s)

You can adjust the back-references to keep only one or two of the spaces ( \s match any white separator)您可以调整反向引用以仅保留一个或两个空格（ \s匹配任何白色分隔符）

Answer 8

I don't know what your real situation looks like, but most of the answers look like they won't handle negative numbers or decimals,我不知道你的真实情况是什么样的，但大多数答案看起来他们不会处理负数或小数，

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

The above should also handle things like,以上还应该处理类似的事情，

"This must not b3 delet3d, but the number at the end yes -134.411" “这一定不是b3 delet3d，而是末尾的数字是-134.411”

But this is still incomplete - you probably need a more complete definition of what you can expect to find in the files you need to parse.但这仍然不完整——您可能需要更完整地定义您可以在需要解析的文件中找到的内容。

Edit: it's also worth noting that '\b' changes depending on the locale/character set you are using so you need to be a little careful with that.编辑：还值得注意的是 '\b' 会根据您使用的语言环境/字符集而变化，因此您需要小心一点。

Answer 9

I had a light-bulb moment, I tried and it works:我有一个灯泡时刻，我尝试过并且它有效：

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

output: output：

aasdsa

Answer 10

Non-regex solution:非正则表达式解决方案：

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

Splits by " " , and checks if the chunk is a number by doing str().isdigit() , then joins them back together.按" "分割，并通过str().isdigit()检查块是否为数字，然后将它们重新连接在一起。 More verbosely (not using a list comprehension):更详细（不使用列表理解）：

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

Answer 11

>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

"This must not b3 delet3d, but the number at the end yes " “这一定不是b3 delete3d，而是末尾的数字yes”

This will remove the numericals at the end of the string.这将删除字符串末尾的数字。

删除 Python 中的数字（正则表达式）

问题描述

11 个解决方案

解决方案1
47 已采纳 2009-05-03 14:04:05

解决方案2
21 2009-05-03 14:12:44

解决方案3
7 2009-05-03 15:05:28

解决方案4
6 2009-05-03 14:23:58

解决方案5
4 2018-12-15 07:45:02

解决方案6
4 2021-03-15 02:38:35

解决方案7
2 2009-05-03 14:06:05

解决方案8
2 2009-05-03 15:37:32

解决方案9
1 2021-11-28 16:49:57

解决方案10
1 2009-05-03 15:21:27

解决方案11
-1 2017-11-20 12:54:20

删除 Python 中的数字（正则表达式）

问题描述

11 个解决方案

解决方案1 47 已采纳 2009-05-03 14:04:05

解决方案2 21 2009-05-03 14:12:44

解决方案3 7 2009-05-03 15:05:28

解决方案4 6 2009-05-03 14:23:58

解决方案5 4 2018-12-15 07:45:02

解决方案6 4 2021-03-15 02:38:35

解决方案7 2 2009-05-03 14:06:05

解决方案8 2 2009-05-03 15:37:32

解决方案9 1 2021-11-28 16:49:57

解决方案10 1 2009-05-03 15:21:27

解决方案11 -1 2017-11-20 12:54:20

解决方案1
47 已采纳 2009-05-03 14:04:05

解决方案2
21 2009-05-03 14:12:44

解决方案3
7 2009-05-03 15:05:28

解决方案4
6 2009-05-03 14:23:58

解决方案5
4 2018-12-15 07:45:02

解决方案6
4 2021-03-15 02:38:35

解决方案7
2 2009-05-03 14:06:05

解决方案8
2 2009-05-03 15:37:32

解决方案9
1 2021-11-28 16:49:57

解决方案10
1 2009-05-03 15:21:27

解决方案11
-1 2017-11-20 12:54:20