I have a following text which I want in a desired format using python regex
text = "' PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'"
I used following code
reg = re.compile("[^\w']")
text = reg.sub(' ', text)
However it gives output as text = "'PowerPoint PresentationOctober 11th 2011 Visit to Lap Chec1Edit or delete me in â viewâ then â slide masterâ'"
which is not a desired output.
My desired output should be text = '"PowerPoint PresentationOctober 11th, 2011(Visit) to Lap Chec1Edit or delete me in view then slide master.'"
I want to remove special characters except following []()-,.
Rather than removing the chars, you may fix them using the right encoding:
text = text.encode('windows-1252').decode('utf-8')
// => ' PowerPoint PresentationOctober 11th, 2011Visit to Lap Chec1Edit or delete me in ‘view’ then ’slide master’.'
See the Python demo
If you want to remove them later, it will become much easier, like text.replace(''', '').replace(''', '')
, or re.sub(r'['']+', '', text)
.
I got the answer though it was simple as follows, thanks for replies.
reg = re.compile("[^\w'\,\.\(\)\[\]]")
text = reg.sub(' ', text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.