[英]remove only consecutive special characters but keep consecutive [a-zA-Z0-9] and single characters
How can I remove multiple consecutive occurrences of all the special characters in a string?如何删除字符串中所有特殊字符的多个连续出现?
I can get the code like:我可以得到这样的代码:
re.sub('\.\.+',' ',string)
re.sub('@@+',' ',string)
re.sub('\s\s+',' ',string)
for individual and in best case, use a loop for all the characters in a list like:对于个人和最好的情况,对列表中的所有字符使用循环,例如:
from string import punctuation
for i in punctuation:
to = ('\\' + i + '\\' + i + '+')
string = re.sub(to, ' ', string)
but I'm sure there is an effective method too.但我相信也有一种有效的方法。
I tried:我试过:
re.sub('[^a-zA-Z0-9][^a-zA-Z0-9]+', ' ', '\n\n.AAA.x.@@+*@#=..xx000..x..\t.x..\nx*+Y.')
but it removes all the special characters except one preceded by alphabets.但它会删除所有特殊字符,但前面有字母的字符除外。
string can have different consecutive special characters like 99@aaaa*!@#$.
字符串可以有不同的连续特殊字符,如99@aaaa*!@#$.
but not same like ++--...
.但与++--...
不一样。
A pattern to match all non-alphanumeric characters in Python is [\\W_]
.在 Python 中匹配所有非字母数字字符的模式是[\\W_]
。
So, all you need is to wrap the pattern with a capturing group and add \\1+
after it to match 2 or more consecutive occurrences of the same non-alphanumeric characters:因此,您所需要的只是用捕获组包装模式并在其后添加\\1+
以匹配 2 个或多个连续出现的相同非字母数字字符:
text = re.sub(r'([\W_])\1+',' ',text)
In Python 3.x, if you wish to make the pattern ASCII aware only, use the re.A
or re.ASCII
flag:在 Python 3.x 中,如果您希望模式仅re.A
ASCII,请使用re.A
或re.ASCII
标志:
text = re.sub(r'([\W_])\1+',' ',text, flags=re.A)
Mind the use of the r
prefix that defines a raw string literal (so that you do not have to escape \\
char).注意使用定义原始字符串文字的r
前缀(这样您就不必转义\\
char)。
See the regex demo .请参阅正则表达式演示。 See the Python demo :请参阅Python 演示:
import re
text = "\n\n.AAA.x.@@+*@#=..xx000..x..\t.x..\nx*+Y."
print(re.sub(r'([\W_])\1+',' ',text))
Output:输出:
.AAA.x. +*@#= xx000 x .x
x*+Y.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.