[英]Remove digits outside alphanumeric characters using regex
I have a string that looks like this:我有一个看起来像这样的字符串:
details = "| 4655748765321 | _jeffybion5 | John Dutch |"
The end product I want is this:我想要的最终产品是这样的:
>>> details
>>> _jeffybion5 John Dutch
My current code removes all digits including those attached to strings, also ignores the whitespace between two or more strings.我当前的代码删除了所有数字,包括那些附加到字符串的数字,也忽略了两个或多个字符串之间的空格。
>>> import re
>>>
>>> details = "| 47574802757 | _jeffybion5 | John Dutch |"
>>> details = re.sub("[0-9]", "", details)
>>> details = re.sub(" +", "", details)
>>> details = details.replace("|", " ")
>>> details
>>> _jeffybion JohnDutch
Any help to achieving the desired result would be really appreciated.任何有助于实现预期结果的帮助将不胜感激。
One approach:一种方法:
chunks = details.split()
res = " ".join(chunk for chunk in chunks if not chunk.isnumeric() and (chunk != "|"))
print(res)
Output Output
_jeffybion5 John Dutch
An alternative using re.findall
:使用
re.findall
的替代方法:
res = " ".join(re.findall(r"\w*[a-zA-z]\w*", details))
print(res)
Output Output
_jeffybion5 John Dutch
A third alternative using also re.findall
:第三种选择也使用
re.findall
:
res = " ".join(re.findall(r"\w*[^\W\d]\w*", details))
The pattern:图案:
[^\W\d]
matches any word character that is not a digit.匹配任何不是数字的单词字符。
The regex solutions are based on the idea that you want strings composed of letters and numbers (also underscore) with at least one letter (or underscore).正则表达式解决方案基于您希望字符串由至少一个字母(或下划线)组成的字母和数字(也下划线)的想法。
With your shown exact samples please try following regex.使用您显示的确切示例,请尝试以下正则表达式。
^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)
Here is the Online demo for above regex.这是上述正则表达式的在线演示。
Python3 code : Using Python3x's re
module's split
function to get required output. Python3 代码:使用 Python3x 的
re
模块split
function 得到所需的 output。
import re
##Creating x variable here...
x="""
| 4655748765321 | _jeffybion5 | John Dutch |
"""
##Getting required output from split function and data manipulation here.
[x.strip(' |\||\n') for x in re.split(r'^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)',var) if x ][0:-1]
##Output:
['_jeffybion5', 'John Dutch']
Explanation: Using regex ^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)
to get required output, this is creating 2 capturing groups which will help us to fetch values later.说明:使用正则表达式
^[^|]*\|[^|]*\|\s+(\S+)\s+\|\s+([^|]*)
来获得所需的 output,这将创建 2 个捕获组将帮助我们稍后获取值。 Then removing new lines or pipes from strip
command further.然后进一步从
strip
命令中删除新的线或管道。 Then removing last item of list, which is empty one created by split function.然后删除列表的最后一项,这是由拆分 function 创建的空项。
For the example data, you might remove a pipe surrounded with optional whitespace chars, and optionally remove digits followed by whitespace chars till the next pipe.对于示例数据,您可以删除用可选空格字符包围的 pipe,并可选地删除数字后跟空格字符,直到下一个 pipe。
Then strip the surrounding spaces.然后剥离周围的空间。
\s*\|\s*(?:\d+\s*\|)?
details = "| 4655748765321 | _jeffybion5 | John Dutch |"
res = re.sub(r"\s*\|\s*(?:\d+\s*\|)?", " ", details).strip()
print(res)
Output Output
_jeffybion5 John Dutch
If there should be a char A-Za-z in the string, you could split in |
如果字符串中应该有一个 char A-Za-z,您可以拆分为
|
between whitespace chars and check for it:在空白字符之间并检查它:
details = "| 4655748765321 | _jeffybion5 | John Dutch | | "
res = " ".join([m for m in re.split(r"\s*\|\s*", details) if re.search(r"[A-Za-z]", m)])
print(res)
Output Output
_jeffybion5 John Dutch
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.