[英]How to exclude a character from a regex group?
I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). 我想从字符串(python)中除去连字符之外的所有非字母数字字符。 How can I change this regular expression to match any non-alphanumeric char except the hyphen?
如何更改此正则表达式以匹配除连字符以外的任何非字母数字字符?
re.compile('[\W_]')
Thanks. 谢谢。
You could just use a negated character class instead: 你可以使用一个否定的字符类代替:
re.compile(r"[^a-zA-Z0-9-]")
This will match anything that is not in the alphanumeric ranges or a hyphen. 这将匹配不在字母数字范围或连字符中的任何内容。 It also matches the underscore, as per your current regex.
根据您当前的正则表达式,它也匹配下划线。
>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'
Notice that this also replaces spaces (which may certainly be what you want). 请注意,这也替换了空格(可能肯定是你想要的)。
Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use: 编辑: SilentGhost建议使用量词处理引擎可能更便宜,在这种情况下,您可以简单地使用:
re.compile(r"[^a-zA-Z0-9-]+")
The +
will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time. +
将简单地导致连续匹配的字符的任何运行同时匹配(并被替换)。
\\w
匹配字母数字,添加连字符,然后否定整个集合: r"[^\\w-]"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.