简体   繁体   中英

Python cannot export to Stata due to unicode problem?

I'm trying to export a dataframe in Python as a Stata dta. This is a slimmed version of the code I'm using:

import pandas as pd

df_master = pd.read_stata(old_dta_location)

# Do some data manipulation.

df_master.to_stata(new_dta_location, {"final_date": "td"}, write_index = False)

I get the following error when I do this:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 11: ordinal not in range(256)

I know there are other questions regarding unicode errors but as they are not related to Stata, options such as putting an argument like 'encoding = "utf8"' doesn't work.

How can I fix this?

By default pandas exports to Stata Version 10 (code 114), which does not support unicode.

Simply specify a later Stata version (118+) to export unicode-columns without error:

df = pd.DataFrame({'animal': ['€falcon', '€parrot', '€falcon','€parrot']})
df.to_stata('animals.dta', version=118)  

Stata files can accept UTF-8 data, simply to_csv insist on using Latin-1 encoding which does not contain the character. A possible workaround is to use directly a StataWriterUTF8 object:

w = pd.io.stata.StataWriterUTF8('foo.dta', df_master)
w.write_file()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM