Python Regex - 删除特殊字符但保留撇号

Question

我试图从一些文本中删除所有特殊字符，这是我的正则表达式：

pattern = re.compile('[\W_]+', re.UNICODE)
words = str(pattern.sub(' ', words))

超级简单，但遗憾的是，当使用撇号（单引号）时会导致问题。 例如，如果我有“不”字样，则此代码返回“doesn”。

有没有办法调整这个正则表达式，以便它不会删除这样的实例中的撇号？

编辑：这是我所追求的：

 doesn't this mean it -technically- works?

应该：

这不意味着它在技术上有效

Answer 1

像这样？

>>> pattern=re.compile("[^\w']")
>>> pattern.sub(' ', "doesn't it rain today?")
"doesn't it rain today "

如果下划线也应该被过滤掉：

>>> re.compile("[^\w']|_").sub(" ","doesn't this _technically_ means it works? naïve I am ...")
"doesn't this  technically  means it works  naïve I am    "

Answer 2

我能够使用此正则表达式将您的样本解析为单词列表： [az]*'?[az]+ 。

然后你可以用空格加入列表的元素。

Answer 3

怎么样

re.sub(r"[^\w' ]", "", "doesn't this mean it -technically- works?")

Answer 4

怎么样([^\\w']|_)+ ？

请注意，这不适用于以下内容：

doesn't this mean it 'technically' works?

这可能不是你想要的。

Python Regex - 删除特殊字符但保留撇号

问题描述

4 个解决方案

解决方案1
13 已采纳 2012-07-09 21:44:01

解决方案2
1 2012-07-09 21:43:10

解决方案3
0 2012-07-09 21:44:35

解决方案4
0 2012-07-09 21:44:36

Python Regex - 删除特殊字符但保留撇号

问题描述

4 个解决方案

解决方案1 13 已采纳 2012-07-09 21:44:01

解决方案2 1 2012-07-09 21:43:10

解决方案3 0 2012-07-09 21:44:35

解决方案4 0 2012-07-09 21:44:36

解决方案1
13 已采纳 2012-07-09 21:44:01

解决方案2
1 2012-07-09 21:43:10

解决方案3
0 2012-07-09 21:44:35

解决方案4
0 2012-07-09 21:44:36