[英]extract strings till ' \n ' character using lookbehnd in regex in python. And it can have alphanumeric characters, special characters and space also
[英]Want to extract the alphanumeric text with certain special characters using python regex
我有一个以下文本,我想用所需的格式使用python正则表达式
text = "' PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'"
我使用以下代码
reg = re.compile("[^\w']")
text = reg.sub(' ', text)
然而,它提供输出为text = "'PowerPoint PresentationOctober 11th 2011 Visit to Lap Chec1Edit or delete me in â viewâ then â slide masterâ'"
这不是一个理想的输出。
我想要的输出应该是text = '"PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in view then slide master.'"
我想删除特殊字符,除了[]()-,.
您可以使用正确的编码修复它们,而不是删除字符:
text = text.encode('windows-1252').decode('utf-8')
// => ' PowerPoint PresentationOctober 11th, 2011Visit to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'
请参阅Python演示
如果你想稍后删除它们,它会变得更容易,比如text.replace(''', '').replace(''', '')
或re.sub(r'['']+', '', text)
。
我得到了答案,虽然它很简单如下,谢谢你的回复。
reg = re.compile("[^\w'\,\.\(\)\[\]]")
text = reg.sub(' ', text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.