想要使用python regex从某些特殊字符中提取字母数字文本

Question

我有一个以下文本，我想用所需的格式使用python正则表达式

text = "' PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in â€˜viewâ€™ then â€™slide masterâ€™.'"

我使用以下代码

reg = re.compile("[^\w']")
text = reg.sub(' ', text)

然而，它提供输出为text = "'PowerPoint PresentationOctober 11th 2011 Visit to Lap Chec1Edit or delete me in â viewâ then â slide masterâ'"这不是一个理想的输出。

我想要的输出应该是text = '"PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in view then slide master.'"我想删除特殊字符，除了[]()-,.

Answer 1

您可以使用正确的编码修复它们，而不是删除字符：

text = text.encode('windows-1252').decode('utf-8')
// => ' PowerPoint PresentationOctober 11th, 2011Visit to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'

请参阅Python演示

如果你想稍后删除它们，它会变得更容易，比如text.replace(''', '').replace(''', '')或re.sub(r'['']+', '', text) 。

Answer 2

我得到了答案，虽然它很简单如下，谢谢你的回复。

reg = re.compile("[^\w'\,\.\(\)\[\]]")
text = reg.sub(' ', text)

想要使用python regex从某些特殊字符中提取字母数字文本

问题描述

2 个解决方案

解决方案1
1 2019-03-27 12:18:19

解决方案2
-1 2019-03-27 12:26:45

想要使用python regex从某些特殊字符中提取字母数字文本

问题描述

2 个解决方案

解决方案1 1 2019-03-27 12:18:19

解决方案2 -1 2019-03-27 12:26:45

解决方案1
1 2019-03-27 12:18:19

解决方案2
-1 2019-03-27 12:26:45