简体   繁体   English

使用正则表达式删除字母数字字符之外的数字

[英]Remove digits outside alphanumeric characters using regex

I have a string that looks like this:我有一个看起来像这样的字符串:

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |"

The end product I want is this:我想要的最终产品是这样的:

>>> details
>>> _jeffybion5 John Dutch

My current code removes all digits including those attached to strings, also ignores the whitespace between two or more strings.我当前的代码删除了所有数字,包括那些附加到字符串的数字,也忽略了两个或多个字符串之间的空格。

>>> import re
>>>
>>> details = "| 47574802757 | _jeffybion5                    | John Dutch                                                    |"
>>> details = re.sub("[0-9]", "", details)
>>> details = re.sub(" +", "", details)
>>> details = details.replace("|", " ") 
>>> details
>>> _jeffybion JohnDutch

Any help to achieving the desired result would be really appreciated.任何有助于实现预期结果的帮助将不胜感激。

Non-Regex Solution非正则表达式解决方案

One approach:一种方法:

chunks =  details.split()
res = " ".join(chunk for chunk in chunks if not chunk.isnumeric() and (chunk != "|"))
print(res)

Output Output

_jeffybion5 John Dutch

Regex Solution正则表达式解决方案

An alternative using re.findall :使用re.findall的替代方法:

res = " ".join(re.findall(r"\w*[a-zA-z]\w*", details))
print(res)

Output Output

_jeffybion5 John Dutch

A third alternative using also re.findall :第三种选择也使用re.findall

res = " ".join(re.findall(r"\w*[^\W\d]\w*", details))

The pattern:图案:

[^\W\d]  

matches any word character that is not a digit.匹配任何不是数字的单词字符。

The regex solutions are based on the idea that you want strings composed of letters and numbers (also underscore) with at least one letter (or underscore).正则表达式解决方案基于您希望字符串由至少一个字母(或下划线)组成的字母和数字(也下划线)的想法。

With your shown exact samples please try following regex.使用您显示的确切示例,请尝试以下正则表达式。

^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)

Here is the Online demo for above regex.这是上述正则表达式的在线演示

Python3 code : Using Python3x's re module's split function to get required output. Python3 代码:使用 Python3x 的re模块split function 得到所需的 output。

import re
##Creating x variable here...
x="""
| 4655748765321 | _jeffybion5                    | John Dutch                                                    |
"""
##Getting required output from split function and data manipulation here. 
[x.strip(' |\||\n') for x in re.split(r'^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)',var) if x ][0:-1]

##Output:
['_jeffybion5', 'John Dutch']

Explanation: Using regex ^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*) to get required output, this is creating 2 capturing groups which will help us to fetch values later.说明:使用正则表达式^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)来获得所需的 output,这将创建 2 个捕获组将帮助我们稍后获取值。 Then removing new lines or pipes from strip command further.然后进一步从strip命令中删除新的线或管道。 Then removing last item of list, which is empty one created by split function.然后删除列表的最后一项,这是由拆分 function 创建的空项。

For the example data, you might remove a pipe surrounded with optional whitespace chars, and optionally remove digits followed by whitespace chars till the next pipe.对于示例数据,您可以删除用可选空格字符包围的 pipe,并可选地删除数字后跟空格字符,直到下一个 pipe。

Then strip the surrounding spaces.然后剥离周围的空间。

\s*\|\s*(?:\d+\s*\|)?

Regex demo正则表达式演示

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |"
res = re.sub(r"\s*\|\s*(?:\d+\s*\|)?", " ", details).strip()
print(res)

Output Output

_jeffybion5 John Dutch

If there should be a char A-Za-z in the string, you could split in |如果字符串中应该有一个 char A-Za-z,您可以拆分为| between whitespace chars and check for it:在空白字符之间并检查它:

details = "| 4655748765321 | _jeffybion5                    | John Dutch                                                    |  | "
res = " ".join([m for m in re.split(r"\s*\|\s*", details) if re.search(r"[A-Za-z]", m)])
print(res)

Output Output

_jeffybion5 John Dutch

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过正则表达式替换删除非字母数字字符 - Remove non-alphanumeric characters by regex substitution 使用Regex匹配python中的字母数字字符列表 - Using Regex to match a list of alphanumeric characters in python 如何使用正则表达式删除 python 中某个字符串的前导和尾随非字母数字字符? - How to remove leading and trailing non-alphanumeric characters of a certain string in python using regex? Python 正则表达式删除字母数字字符而不删除字符串末尾的单词 - Python regex to remove alphanumeric characters without removing words at the end of the string 使用正则表达式识别 python 中的字符和数字 - using regex to identify characters and digits in python 想要使用python regex从某些特殊字符中提取字母数字文本 - Want to extract the alphanumeric text with certain special characters using python regex 使用Python替换正则表达式中的非字母数字字符 - Replacing non-alphanumeric characters in regex match using Python 如果使用正则表达式连接字符串,则从字符串中删除数字 - Remove digits from the string if they are concatenated using Regex 使用regEx从字符串中删除数字 - Using regEx to remove digits from string python 中的正则表达式(用 # 替换数字并删除所有其他字符) - Regex in python (replace digits with # and remove all other characters)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM