I'm using python 2.7 and I have some problems converting chars like "ä" to "ae".
I'm retrieving the content of a webpage using:
req = urllib2.Request(url + str(questionID))
response = urllib2.urlopen(req)
data = response.read()
After that I'm doing some extraction stuff and there is my problem.
extractedStr = pageContent[start:end] // this string contains the "ä" !
extractedStr = extractedStr.decode("utf8") // here I get the error, tried it with encode aswell
extractedStr = extractedStr.replace(u"ä", "ae")
--> 'utf8' codec can't decode byte 0xe4 in position 13: invalid continuation byte
But: my simple trial is working fine...:
someStr = "geräusch"
someStr = someStr.decode("utf8")
someStr = someStr.replace(u"ä", "ae")
I've got the feeling, it has something to do with WHEN I try to use the .decode() function... I tried it at several positions, no success :(
Use .decode("latin-1")
instead. That is what you are trying to decode.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.