Converting to utf-8 python

Question

I am using Ubuntu and python 3.4 to download data from wikipedia's api. I am saving the names and I saw that characters in different languages are not saved correctly.

For example: 日の火曜日 is saved as æ¥ã®ç«ææ¥.

I figured I may not be saving it in utf-8 so I changed my code to

fd = io.open("filename",'w',encoding='utf8')
fd.write(str(name.encode('utf-8'), 'utf-8'))

But I still get the same result.

The api I am using is here .

What I understand upto this point is that, utf-8 should be able to handle texts for all languages. Also the console in Ubuntu has utf-8 by default and it should print out the strings correctly if I run a command like more .

Answer 1

Check the reponse code, get charset from the reponse and use it to decode. You can use name.decode('utf-8') or str(name, 'utf-8') to decode.

Fe

resp = urlopen(url)
if resp.code == 200:
    with open('filename', 'w') as fd:
        fd.write(str(name, resp.info().get_content_charset()))
        #or fd.write(name.decode(resp.info().get_content_charset()))

Answer 2

The problem was my ssh client and it was displaying the strings incorrectly. The code was working fine.

Converting to utf-8 python

Question

2 answers

solution1
0 2016-06-28 20:02:16

solution2
0 ACCPTED 2016-06-28 22:02:34

Converting to utf-8 python

Question

2 answers

solution1 0 2016-06-28 20:02:16

solution2 0 ACCPTED 2016-06-28 22:02:34

solution1
0 2016-06-28 20:02:16

solution2
0 ACCPTED 2016-06-28 22:02:34