Python: Remove soft hyphen(s)

Question

in a html file, I've got words containing soft hyphens, eg

"Schilde rung"
repr(word) = "Schilde\\xc2\\xadrung"

How can I remove them?

Since my file also contains umlaute and other special chars, solutions with printable or with words.decode('ascii', 'ignore') aren't terribly good...

I already tried it using words.replace('\\xc2\\xad', '') ; but this didn't work.

Thanks for any help :)

Answer 1

You can't run replace on a list; you need to run it on each member of the list:

words = ["Hello", "Schilde\xc2\xadrung"]
words = [word.replace('\xc2\xad', '') for word in words]
print repr(words)
# Prints ['Hello', 'Schilderung']

Python: Remove soft hyphen(s)

Question

1 answers

solution1
4 ACCPTED 2013-09-06 21:21:51

Python: Remove soft hyphen(s)

Question

1 answers

solution1 4 ACCPTED 2013-09-06 21:21:51

solution1
4 ACCPTED 2013-09-06 21:21:51