[英]how remove special characters from the end of every word in a string?
i want it match only the end of every word 我希望它只与每个单词的结尾匹配
example: 例:
"i am test-ing., i am test.ing-, i am_, test_ing,"
output should be: 输出应为:
"i am test-ing i am test.ing i am test_ing"
>>> import re
>>> test = "i am test-ing., i am test.ing-, i am_, test_ing,"
>>> re.sub(r'([^\w\s]|_)+(?=\s|$)', '', test)
'i am test-ing i am test.ing i am test_ing'
Matches one or more non-alphanumeric characters ( [^\\w\\s]|_
) followed by either a space ( \\s
) or the end of the string ( $
). 匹配一个或多个非字母数字字符( [^\\w\\s]|_
),后跟一个空格( \\s
)或字符串的结尾( $
)。 The (?= )
construct is a lookahead assertion: it makes sure that a matching space is not included in the match, so it doesn't get replaced; (?= )
构造是一个先行断言:它确保匹配中不包含匹配空间,因此不会被替换; only the [\\W_]+
gets replaced. 只有[\\W_]+
被替换。
Okay, but why [^\\w\\s]|_
, you ask? 好的,但是为什么要问[^\\w\\s]|_
呢? The first part matches anything that's not alphanumeric or an underscore ( [^\\w]
) or whitespace ( [^\\s]
), ie punctuation characters. 第一部分与任何非字母数字或下划线( [^\\w]
)或空格( [^\\s]
)的内容匹配,即标点符号。 Except we do want to eliminate underscores, so we then include those with |_
. 除非我们确实要消除下划线,否则我们将其包含|_
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.