简体   繁体   English

将pandas df写入csv时出现Unicode编码错误

[英]Unicode Encode Error when writing pandas df to csv

I cleaned 400 excel files and read them into python using pandas and appended all the raw data into one big df. 我清理了400个excel文件并使用pandas将它们读入python并将所有原始数据附加到一个大df中。

Then when I try to export it to a csv: 然后,当我尝试将其导出到csv时:

df.to_csv("path",header=True,index=False)

I get this error: 我收到此错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 20: ordinal not in range(128)

Can someone suggest a way to fix this and what it means? 有人可以建议一种方法来解决这个问题及其意义吗?

Thanks 谢谢

You have unicode values in your DataFrame. 您的DataFrame中有unicode值。 Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. 文件存储字节,这意味着所有unicode必须先编码为字节才能存储在文件中。 You have to specify an encoding, such as utf-8 . 您必须指定编码,例如utf-8 For example, 例如,

df.to_csv('path', header=True, index=False, encoding='utf-8')

If you don't specify an encoding, then the encoding used by df.to_csv defaults to ascii in Python2, or utf-8 in Python3. 如果未指定编码,则df.to_csv使用的编码默认为df.to_csv中的ascii或Python3中的utf-8

Adding an answer to help myself google it later: 添加答案以帮助自己稍后谷歌搜索:

One trick that helped me is to encode a problematic series first, then decode it back to utf-8. 帮助我的一个技巧是首先编码有问题的系列,然后将其解码回utf-8。 Like: 喜欢:

df['crumbs'] = df['crumbs'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

This would get the dataframe to print correctly too. 这样也可以正确打印数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM