简体   繁体   中英

Python: Remove soft hyphen(s)

in a html file, I've got words containing soft hyphens, eg

"Schilde rung"
repr(word) = "Schilde\\xc2\\xadrung"

How can I remove them?

Since my file also contains umlaute and other special chars, solutions with printable or with words.decode('ascii', 'ignore') aren't terribly good...

I already tried it using words.replace('\\xc2\\xad', '') ; but this didn't work.

Thanks for any help :)

You can't run replace on a list; you need to run it on each member of the list:

words = ["Hello", "Schilde\xc2\xadrung"]
words = [word.replace('\xc2\xad', '') for word in words]
print repr(words)
# Prints ['Hello', 'Schilderung']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM