简体   繁体   中英

Converting to utf-8 python

I am using Ubuntu and python 3.4 to download data from wikipedia's api. I am saving the names and I saw that characters in different languages are not saved correctly.

For example: 日の火曜日 is saved as æ¥ã®ç«ææ¥.

I figured I may not be saving it in utf-8 so I changed my code to

fd = io.open("filename",'w',encoding='utf8')
fd.write(str(name.encode('utf-8'), 'utf-8'))

But I still get the same result.

The api I am using is here .

What I understand upto this point is that, utf-8 should be able to handle texts for all languages. Also the console in Ubuntu has utf-8 by default and it should print out the strings correctly if I run a command like more .

Check the reponse code, get charset from the reponse and use it to decode. You can use name.decode('utf-8') or str(name, 'utf-8') to decode.

Fe

resp = urlopen(url)
if resp.code == 200:
    with open('filename', 'w') as fd:
        fd.write(str(name, resp.info().get_content_charset()))
        #or fd.write(name.decode(resp.info().get_content_charset()))

The problem was my ssh client and it was displaying the strings incorrectly. The code was working fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM