简体   繁体   English

如何从正则表达式组中排除一个字符?

[英]How to exclude a character from a regex group?

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). 我想从字符串(python)中除去连字符之外的所有非字母数字字符。 How can I change this regular expression to match any non-alphanumeric char except the hyphen? 如何更改此正则表达式以匹配除连字符以外的任何非字母数字字符?

re.compile('[\W_]')

Thanks. 谢谢。

You could just use a negated character class instead: 你可以使用一个否定的字符类代替:

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. 这将匹配不在字母数字范围或连字符中的任何内容。 It also matches the underscore, as per your current regex. 根据您当前的正则表达式,它也匹配下划线。

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want). 请注意,这也替换了空格(可能肯定是你想要的)。


Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use: 编辑: SilentGhost建议使用量词处理引擎可能更便宜,在这种情况下,您可以简单地使用:

re.compile(r"[^a-zA-Z0-9-]+")

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time. +将简单地导致连续匹配的字符的任何运行同时匹配(并被替换)。

\\w匹配字母数字,添加连字符,然后否定整个集合: r"[^\\w-]"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM