I am using Ubuntu and python 3.4 to download data from wikipedia's api. I am saving the names and I saw that characters in different languages are not saved correctly.
For example: 日の火曜日 is saved as æ¥ã®ç«ææ¥.
I figured I may not be saving it in utf-8 so I changed my code to
fd = io.open("filename",'w',encoding='utf8')
fd.write(str(name.encode('utf-8'), 'utf-8'))
But I still get the same result.
The api I am using is here .
What I understand upto this point is that, utf-8 should be able to handle texts for all languages. Also the console in Ubuntu has utf-8 by default and it should print out the strings correctly if I run a command like more
.
Check the reponse code, get charset from the reponse and use it to decode. You can use name.decode('utf-8')
or str(name, 'utf-8')
to decode.
Fe
resp = urlopen(url)
if resp.code == 200:
with open('filename', 'w') as fd:
fd.write(str(name, resp.info().get_content_charset()))
#or fd.write(name.decode(resp.info().get_content_charset()))
The problem was my ssh client and it was displaying the strings incorrectly. The code was working fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.