Python脚本，Pandas特殊字符

Question

I am using this script to geocode addresses. 我正在使用此脚本对地址进行地址解析。 The script works fine, however the output file converts special characters such as 中央区 and Athénée to gibberish. 该脚本可以正常工作，但是输出文件将特殊字符（例如中央区和Athénée为乱码。 ie 即

中央区 -> ä¸å¤®åŒº 中央区 -> ä¸å¤®åŒº

Athénée -> AthÃ©nÃ©e Athénée - > AthÃ©nÃ©e

The input file is a UTF-8 .CSV saved in MAC excel. 输入文件是保存在MAC excel中的UTF-8 .CSV。 The script is using Pandas to process data. 该脚本正在使用Pandas处理数据。 How could I support special characters such as the above? 我如何支持上述特殊字符？

The code for the full script can be found here: https://github.com/shanealynn/python_batch_geocode/blob/master/python_batch_geocoding.py 完整脚本的代码可以在这里找到： https : //github.com/shanealynn/python_batch_geocode/blob/master/python_batch_geocoding.py

 import pandas as pd
    import requests
    import logging
    import time

    #------------------ CONFIGURATION -------------------------------

    # Set your output file name here.
    output_filename = '/Users/_Library/Python/geobatch/res1000_output.csv'
    # Set your input file here
    input_filename = "/Users/_Library/Python/geobatch/res1000.csv"
    # Specify the column name in your input data that contains addresses here
    address_column_name = "Address"
    # Return Full Google Results? If True, full JSON results from Google are included in output
    RETURN_FULL_RESULTS = False

    #------------------ DATA LOADING --------------------------------

    # Read the data to a Pandas Dataframe
    data = pd.read_csv(input_filename, encoding='utf8')

    addresses = data[address_column_name].tolist()


    # All done
    logger.info("Finished geocoding all addresses")
    # Write the full results to csv using the pandas library.
    pd.DataFrame(results).to_csv(output_filename, encoding='utf8')

Answer 1

If I insert the line: 如果我插入一行：

data['Address'] = data['Address'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

to decode and re-encode the inputs - then output becomes. 对输入进行解码和重新编码-然后输出变为。

中央区 -> \中\央\区 instead of ä¸å¤®åŒº 中央区 -> \中\央\区而不是ä¸å¤®åŒº

which is one step closer to the right direction I presume, if someone could built on this? 如果有人可以在此基础上朝正确的方向迈出第一步呢？

Answer 2

Apparently this is the solution I am working with: 显然，这是我正在使用的解决方案：

# Write the full results to csv using the pandas library.
pd.DataFrame(results).to_csv(output_filename, encoding='utf-8-sig')

Python脚本，Pandas特殊字符

问题描述

2 个解决方案

解决方案1
0 2018-09-21 19:25:34

解决方案2
0 已采纳 2018-09-22 18:10:22

Python脚本，Pandas特殊字符

问题描述

2 个解决方案

解决方案1 0 2018-09-21 19:25:34

解决方案2 0 已采纳 2018-09-22 18:10:22

解决方案1
0 2018-09-21 19:25:34

解决方案2
0 已采纳 2018-09-22 18:10:22