简体   繁体   中英

python string encoding unicode

I'm using python 2.7 and I have some problems converting chars like "ä" to "ae".

I'm retrieving the content of a webpage using:

req = urllib2.Request(url + str(questionID))
response = urllib2.urlopen(req)
data = response.read()

After that I'm doing some extraction stuff and there is my problem.

extractedStr = pageContent[start:end] // this string contains the "ä" !
extractedStr = extractedStr.decode("utf8") // here I get the error, tried it with encode aswell
extractedStr = extractedStr.replace(u"ä", "ae")

--> 'utf8' codec can't decode byte 0xe4 in position 13: invalid continuation byte

But: my simple trial is working fine...:

someStr = "geräusch"
someStr = someStr.decode("utf8")
someStr = someStr.replace(u"ä", "ae")

I've got the feeling, it has something to do with WHEN I try to use the .decode() function... I tried it at several positions, no success :(

Use .decode("latin-1") instead. That is what you are trying to decode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM