[英]Python Script, Pandas Special Characters
I am using this script to geocode addresses. 我正在使用此脚本对地址进行地址解析。 The script works fine, however the output file converts special characters such as
中央区
and Athénée
to gibberish. 该脚本可以正常工作,但是输出文件将特殊字符(例如
中央区
和Athénée
为乱码。 ie 即
中央区
-> ä¸å¤®åŒº
中央区
-> ä¸å¤®åŒº
Athénée
-> Athénée
Athénée
- > Athénée
The input file is a UTF-8 .CSV saved in MAC excel. 输入文件是保存在MAC excel中的UTF-8 .CSV。 The script is using Pandas to process data.
该脚本正在使用Pandas处理数据。 How could I support special characters such as the above?
我如何支持上述特殊字符?
The code for the full script can be found here: https://github.com/shanealynn/python_batch_geocode/blob/master/python_batch_geocoding.py 完整脚本的代码可以在这里找到: https : //github.com/shanealynn/python_batch_geocode/blob/master/python_batch_geocoding.py
import pandas as pd
import requests
import logging
import time
#------------------ CONFIGURATION -------------------------------
# Set your output file name here.
output_filename = '/Users/_Library/Python/geobatch/res1000_output.csv'
# Set your input file here
input_filename = "/Users/_Library/Python/geobatch/res1000.csv"
# Specify the column name in your input data that contains addresses here
address_column_name = "Address"
# Return Full Google Results? If True, full JSON results from Google are included in output
RETURN_FULL_RESULTS = False
#------------------ DATA LOADING --------------------------------
# Read the data to a Pandas Dataframe
data = pd.read_csv(input_filename, encoding='utf8')
addresses = data[address_column_name].tolist()
# All done
logger.info("Finished geocoding all addresses")
# Write the full results to csv using the pandas library.
pd.DataFrame(results).to_csv(output_filename, encoding='utf8')
If I insert the line: 如果我插入一行:
data['Address'] = data['Address'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))
to decode and re-encode the inputs - then output becomes. 对输入进行解码和重新编码-然后输出变为。
中央区
-> \中\央\区
instead of ä¸å¤®åŒº
中央区
-> \中\央\区
而不是ä¸å¤®åŒº
which is one step closer to the right direction I presume, if someone could built on this? 如果有人可以在此基础上朝正确的方向迈出第一步呢?
Apparently this is the solution I am working with: 显然,这是我正在使用的解决方案:
# Write the full results to csv using the pandas library.
pd.DataFrame(results).to_csv(output_filename, encoding='utf-8-sig')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.