简体   繁体   English

在 Python 中使用 RegEx 替换除一种特定模式之外的所有非字母数字字符

[英]Replace all Non-Alphanumeric Characters except one particular pattern using RegEx in Python

Suppose, I have some text like假设,我有一些像

text = "xyz - aabc 123.56 cancer s15.2 date 12/03/2021 @ dd hospital www.someurl ocr.5rror 123.sometext" text = "xyz - aabc 123.56 癌症 s15.2 日期 12/03/2021 @ dd 医院 www.someurl ocr.5rror 123.sometext"

Now, I want to create a regex that will replace any non-alphanumeric character with space except the dot(.) surrounded by digit, that is the final text should be like this现在,我想创建一个正则表达式,它将用空格替换任何非字母数字字符,除了被数字包围的点(.),最终文本应该是这样的

"xyz aabc 123.56 cancer s15.2 date 12 03 2021 dd hospital www someurl ocr 5rror 123 sometext" “xyz aabc 123.56 癌症 s15.2 日期 12 03 2021 dd 医院 www someurl ocr 5rror 123 sometext”

I have this regex that can find these matches re.findall(r"(\\b[a-z0-9] \\d.\\d[a-z0-9] \\b)", text) gives me ['123.56', 's15.2'] , but I am not able to get the above text.我有这个正则表达式可以找到这些匹配项re.findall(r"(\\b[a-z0-9] \\d.\\d[a-z0-9] \\b)", text)给我['123.56' , 's15.2'] ,但我无法获得上述文本。

Thanks in advance.提前致谢。

You can use re.sub and a pattern with a capture group您可以使用 re.sub 和带有捕获组的模式

(\d+(?:\.\d+))|\W+

The pattern matches:模式匹配:

  • (\\d+(?:\\.\\d+)) Capture digits with an optional decimal part with a dot in group 1 (\\d+(?:\\.\\d+))捕获带有可选小数部分和组 1 中的点的数字
  • | OR要么
  • \\W+ Match 1+ non word characters to replace with a single space (or use a negated character class [^a-zA-Z0-9]+ to keep matching an underscore) \\W+匹配 1+ 个非单词字符以替换为单个空格(或使用否定字符类[^a-zA-Z0-9]+保持匹配下划线)

In the replacement keep the capture group, and replace the match for 1+ non word characters with a space.在替换中保留捕获组,并将 1+ 非单词字符的匹配替换为空格。

See a regex demo and a Python demo查看正则表达式演示Python 演示

import re

s = "xyz - aabc 123.56 cancer s15.2 date 12/03/2021 @ dd hospital www.someurl ocr.5rror 123.sometext"
pattern = r"(\d+(?:\.\d+))|\W+"
print(re.sub(pattern, lambda x: x.group(1) if x.group(1) else " ", s))

Output输出

xyz aabc 123.56 cancer s15.2 date 12 03 2021 dd hospital www someurl ocr 5rror 123 sometext

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM