[英]Using regEx to remove digits from string
I am trying to remove all digits from a string that are not attached to a word. 我试图从字符串中删除未附加到单词的所有数字。 Examples:
例子:
"python 3" => "python"
"python3" => "python3"
"1something" => "1something"
"2" => ""
"434" => ""
"python 35" => "python"
"1 " => ""
" 232" => ""
Till now I am using the following regular expression: 直到现在我使用以下正则表达式:
((?<=[ ])[0-9]+(?=[ ])|(?<=[ ])[0-9]+|^[0-9]$)
which can correctly do some of the examples above, but not all. 这可以正确地做上面的一些例子,但不是全部。 Any help and some explanation?
任何帮助和一些解释?
Why not just use word boundaries? 为什么不使用单词边界?
\b\d+\b
Here is an example: 这是一个例子:
>>> import re
>>> words = ['python 3', 'python3', '1something', '2', '434', 'python 35', '1 ', ' 232']
>>> for word in words:
... print("'{}' => '{}'".format(word, re.sub(r'\b\d+\b', '', word)))
...
'python 3' => 'python '
'python3' => 'python3'
'1something' => '1something'
'2' => ''
'434' => ''
'python 35' => 'python '
'1 ' => ' '
' 232' => ' '
Note that this will not remove spaces before and after. 请注意,这不会删除前后的空格。 I would advise using
strip()
, but if not you can probably do \\b\\d+\\b\\s*
(for space after) or something similar. 我建议使用
strip()
,但如果没有,你可以做\\b\\d+\\b\\s*
(后面的空格)或类似的东西。
You could just split the words and remove any words that are digits which is a lot easier to read: 您可以拆分单词并删除任何数字更容易阅读的单词:
new = " ".join([w for w in s.split() if not w.isdigit()])
And also seems faster: 而且似乎更快:
In [27]: p = re.compile(r'\b\d+\b')
In [28]: s = " ".join(['python 3', 'python3', '1something', '2', '434', 'python
...: 35', '1 ', ' 232'])
In [29]: timeit " ".join([w for w in s.split() if not w.isdigit()])
100000 loops, best of 3: 1.54 µs per loop
In [30]: timeit p.sub('', s)
100000 loops, best of 3: 3.34 µs per loop
It also removes the space like your expected output: 它还会删除预期输出的空间:
In [39]: re.sub(r'\b\d+\b', '', " 2")
Out[39]: ' '
In [40]: " ".join([w for w in " 2".split() if not w.isdigit()])
Out[40]: ''
In [41]: re.sub(r'\b\d+\b', '', s)
Out[41]: 'python python3 1something python '
In [42]: " ".join([w for w in s.split() if not w.isdigit()])
Out[42]: 'python python3 1something python'
So both approaches are significantly different. 因此两种方法都有很大不同。
This regex, (\\s|^)\\d+(\\s|$), could work as shown below in javascript 这个正则表达式(\\ s | ^)\\ d +(\\ s | $),可以在javascript中如下所示工作
var value = "1 3@bar @foo2 * 112"; var matches = value.replace(/(\\s|^)\\d+(\\s|$)/g,""); console.log(matches)
It works in 3 parts: 它分为3部分:
You can replace $ with end of line or \\n if you have several lines or just add it in next to it like this (\\s|$|\\n). 您可以将$替换为行尾或\\ n如果您有多行,或者只是将其添加到它旁边(\\ s | $ | \\ n)。 Hope this is what your're looking for.
希望这是你正在寻找的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.