in a html file, I've got words containing soft hyphens, eg
"Schilde rung"
repr(word) = "Schilde\\xc2\\xadrung"
How can I remove them?
Since my file also contains umlaute and other special chars, solutions with printable or with words.decode('ascii', 'ignore')
aren't terribly good...
I already tried it using words.replace('\\xc2\\xad', '')
; but this didn't work.
Thanks for any help :)
You can't run replace
on a list; you need to run it on each member of the list:
words = ["Hello", "Schilde\xc2\xadrung"]
words = [word.replace('\xc2\xad', '') for word in words]
print repr(words)
# Prints ['Hello', 'Schilderung']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.