由于unicode问题，Python无法导出到Stata？

Question

I'm trying to export a dataframe in Python as a Stata dta.我正在尝试将 Python 中的数据帧导出为 Stata 数据。 This is a slimmed version of the code I'm using:这是我正在使用的代码的精简版：

import pandas as pd

df_master = pd.read_stata(old_dta_location)

# Do some data manipulation.

df_master.to_stata(new_dta_location, {"final_date": "td"}, write_index = False)

I get the following error when I do this:执行此操作时出现以下错误：

UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 11: ordinal not in range(256)

I know there are other questions regarding unicode errors but as they are not related to Stata, options such as putting an argument like 'encoding = "utf8"' doesn't work.我知道还有其他关于 unicode 错误的问题，但由于它们与 Stata 无关，因此诸如放置像 'encoding = "utf8"' 这样的参数的选项不起作用。

How can I fix this?我怎样才能解决这个问题？

Answer 1

By default pandas exports to Stata Version 10 (code 114), which does not support unicode.默认情况下，pandas 导出到不支持 unicode 的 Stata 版本 10（代码 114）。

Simply specify a later Stata version (118+) to export unicode-columns without error:只需指定更高的 Stata 版本 (118+) 即可正确导出 unicode 列：

df = pd.DataFrame({'animal': ['€falcon', '€parrot', '€falcon','€parrot']})
df.to_stata('animals.dta', version=118)

Answer 2

Stata files can accept UTF-8 data, simply to_csv insist on using Latin-1 encoding which does not contain the € character. Stata 文件可以接受 UTF-8 数据，只是to_csv坚持使用不包含€字符的 Latin-1 编码。 A possible workaround is to use directly a StataWriterUTF8 object:一种可能的解决方法是直接使用StataWriterUTF8对象：

w = pd.io.stata.StataWriterUTF8('foo.dta', df_master)
w.write_file()

由于unicode问题，Python无法导出到Stata？

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-04-06 16:15:16

解决方案2
1 2021-03-25 18:16:24

由于unicode问题，Python无法导出到Stata？

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-04-06 16:15:16

解决方案2 1 2021-03-25 18:16:24

解决方案1
2 已采纳 2021-04-06 16:15:16

解决方案2
1 2021-03-25 18:16:24