使用正则表达式删除字母数字字符之外的数字

Question

I have a string that looks like this:我有一个看起来像这样的字符串：

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |"

The end product I want is this:我想要的最终产品是这样的：

>>> details
>>> _jeffybion5 John Dutch

My current code removes all digits including those attached to strings, also ignores the whitespace between two or more strings.我当前的代码删除了所有数字，包括那些附加到字符串的数字，也忽略了两个或多个字符串之间的空格。

>>> import re
>>>
>>> details = "| 47574802757 | _jeffybion5                    | John Dutch                                                    |"
>>> details = re.sub("[0-9]", "", details)
>>> details = re.sub(" +", "", details)
>>> details = details.replace("|", " ") 
>>> details
>>> _jeffybion JohnDutch

Any help to achieving the desired result would be really appreciated.任何有助于实现预期结果的帮助将不胜感激。

Answer 1

Non-Regex Solution非正则表达式解决方案

One approach:一种方法：

chunks =  details.split()
res = " ".join(chunk for chunk in chunks if not chunk.isnumeric() and (chunk != "|"))
print(res)

Output Output

_jeffybion5 John Dutch

Regex Solution正则表达式解决方案

An alternative using re.findall :使用re.findall的替代方法：

res = " ".join(re.findall(r"\w*[a-zA-z]\w*", details))
print(res)

Output Output

_jeffybion5 John Dutch

A third alternative using also re.findall :第三种选择也使用re.findall ：

res = " ".join(re.findall(r"\w*[^\W\d]\w*", details))

The pattern:图案：

[^\W\d]

matches any word character that is not a digit.匹配任何不是数字的单词字符。

The regex solutions are based on the idea that you want strings composed of letters and numbers (also underscore) with at least one letter (or underscore).正则表达式解决方案基于您希望字符串由至少一个字母（或下划线）组成的字母和数字（也下划线）的想法。

Answer 2

With your shown exact samples please try following regex.使用您显示的确切示例，请尝试以下正则表达式。

^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)

Here is the Online demo for above regex.这是上述正则表达式的在线演示。

Python3 code : Using Python3x's re module's split function to get required output. Python3 代码：使用 Python3x 的re模块split function 得到所需的 output。

import re
##Creating x variable here...
x="""
| 4655748765321 | _jeffybion5                    | John Dutch                                                    |
"""
##Getting required output from split function and data manipulation here. 
[x.strip(' |\||\n') for x in re.split(r'^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)',var) if x ][0:-1]

##Output:
['_jeffybion5', 'John Dutch']

Explanation: Using regex ^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*) to get required output, this is creating 2 capturing groups which will help us to fetch values later.说明：使用正则表达式^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)来获得所需的 output，这将创建 2 个捕获组将帮助我们稍后获取值。 Then removing new lines or pipes from strip command further.然后进一步从strip命令中删除新的线或管道。 Then removing last item of list, which is empty one created by split function.然后删除列表的最后一项，这是由拆分 function 创建的空项。

Answer 3

For the example data, you might remove a pipe surrounded with optional whitespace chars, and optionally remove digits followed by whitespace chars till the next pipe.对于示例数据，您可以删除用可选空格字符包围的 pipe，并可选地删除数字后跟空格字符，直到下一个 pipe。

Then strip the surrounding spaces.然后剥离周围的空间。

\s*\|\s*(?:\d+\s*\|)?

Regex demo正则表达式演示

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |"
res = re.sub(r"\s*\|\s*(?:\d+\s*\|)?", " ", details).strip()
print(res)

Output Output

_jeffybion5 John Dutch

If there should be a char A-Za-z in the string, you could split in |如果字符串中应该有一个 char A-Za-z，您可以拆分为| between whitespace chars and check for it:在空白字符之间并检查它：

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |  | "
res = " ".join([m for m in re.split(r"\s*\|\s*", details) if re.search(r"[A-Za-z]", m)])
print(res)

Output Output

_jeffybion5 John Dutch

使用正则表达式删除字母数字字符之外的数字

问题描述

3 个解决方案

解决方案1
3 2022-07-29 12:17:42

Non-Regex Solution非正则表达式解决方案

Regex Solution正则表达式解决方案

解决方案2
2 2022-07-29 13:08:37

解决方案3
2 2022-07-29 14:11:55

使用正则表达式删除字母数字字符之外的数字

问题描述

3 个解决方案

解决方案1 3 2022-07-29 12:17:42

Non-Regex Solution非正则表达式解决方案

Regex Solution正则表达式解决方案

解决方案2 2 2022-07-29 13:08:37

解决方案3 2 2022-07-29 14:11:55

解决方案1
3 2022-07-29 12:17:42

解决方案2
2 2022-07-29 13:08:37

解决方案3
2 2022-07-29 14:11:55