简体   繁体   English

在utf-8中编码数据

[英]Encoding data in utf-8

For a project I have to download data of many cities in the world, so with several special characters or accent, but I can't visualize it well. 对于一个项目,我必须下载世界上许多城市的数据,因此要使用几个特殊字符或重音符号,但是我无法很好地将其形象化。

I have tried to encode it with utf-8 but without luck: I do know why but I haven't errors from terminal but I continue to visualize city name like this one: L'H\ôpital Puits II , or this other one Marsza\łkowska, Warszawa . 我尝试用utf-8对其进行编码,但没有运气:我确实知道为什么,但是在终端上没有出现错误,但是我继续形象化这样的城市名称: L'H\ôpital Puits II或另一个Marsza\łkowska, Warszawa

Can someone help pinpoint the error, or what can I try? 有人可以帮助您找出错误,或者我可以尝试什么?

import requests

w = open("cittadine.txt","wb")

fullMap = requests.get("http://aqicn.org/map/world/").text
print type(fullMap) # <type 'unicode'>
fullMap = fullMap.encode("utf-8")
w.writelines(fullMap)

Your code is ok. 您的代码还可以。 The reason you're getting {L'H\ôpital Puits II}} , is because the server is sending that exact string! 之所以得到{L'H\ôpital Puits II}} ,是因为服务器正在发送该确切的字符串!

curl "http://aqicn.org/map/world/" | grep -o "L'H\\\\u00f4pital Puits II"
L'H\u00f4pital Puits II

That string appears in a block of JSON, so you need to find that block, then use the JSON module to decode it, which should convert this Unicode point back to a proper character. 该字符串出现在JSON块中,因此您需要找到该块,然后使用JSON模块对其进行解码,这会将Unicode点转换回正确的字符。

Beautiful Soup is probably the best way to find the JSON block. Beautiful Soup可能是找到JSON块的最佳方法。

Suggestion 建议

A neater way to write UTF-8 to a file is to use an encoding TextWrapper, which will automatically encode Unicode chars on write: 将UTF-8写入文件的一种更整洁的方法是使用编码TextWrapper,它将在写入时自动对Unicode字符进行编码:

import requests
import io

w = io.open("cittadine.txt","w", encoding="utf-8")

fullMap = requests.get("http://aqicn.org/map/world/").text
print type(fullMap) # <type 'unicode'>
w.write(fullmap)

If you need to write Unicode to a Windows terminal, install https://github.com/Drekin/win-unicode-console 如果您需要将Unicode写入Windows终端,请安装https://github.com/Drekin/win-unicode-console

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM