如何从正则表达式组中排除一个字符？

Question

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). 我想从字符串（python）中除去连字符之外的所有非字母数字字符。 How can I change this regular expression to match any non-alphanumeric char except the hyphen? 如何更改此正则表达式以匹配除连字符以外的任何非字母数字字符？

re.compile('[\W_]')

Thanks. 谢谢。

Answer 1

You could just use a negated character class instead: 你可以使用一个否定的字符类代替：

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. 这将匹配不在字母数字范围或连字符中的任何内容。 It also matches the underscore, as per your current regex. 根据您当前的正则表达式，它也匹配下划线。

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want). 请注意，这也替换了空格（可能肯定是你想要的）。

Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use: 编辑： SilentGhost建议使用量词处理引擎可能更便宜，在这种情况下，您可以简单地使用：

re.compile(r"[^a-zA-Z0-9-]+")

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time. +将简单地导致连续匹配的字符的任何运行同时匹配（并被替换）。

Answer 2

\\w匹配字母数字，添加连字符，然后否定整个集合： r"[^\\w-]"

如何从正则表达式组中排除一个字符？

问题描述

2 个解决方案

解决方案1
25 已采纳 2010-11-05 17:54:08

解决方案2
6 2010-11-05 17:57:07

如何从正则表达式组中排除一个字符？

问题描述

2 个解决方案

解决方案1 25 已采纳 2010-11-05 17:54:08

解决方案2 6 2010-11-05 17:57:07

解决方案1
25 已采纳 2010-11-05 17:54:08

解决方案2
6 2010-11-05 17:57:07