[英]How to import .dta via pandas and describe data?
I am new to python and have a simple problem. 我是python的新手,有一个简单的问题。 In a first step, I want to load some sample data I created in Stata. 第一步,我想加载我在Stata中创建的一些示例数据。 In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. 第二步,我想用python描述数据-也就是说,我想要一个导入变量名的列表。 So far I've done this: 到目前为止,我已经做到了:
from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()
I get the following error: 我收到以下错误:
anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
warnings.warn("'data' is deprecated, use 'read' instead")
What does it mean and how can I resolve the issue? 这是什么意思,我该如何解决? And, is dir()
the right way to get an understanding of what variables I have in the data? 而且, dir()
是了解我在数据中具有哪些变量的正确方法吗?
Using pandas.io.stata.StataReader.data
to read from a stata
file has been deprecated in pandas 0.18.1
version and hence you are getting that warning. 在pandas 0.18.1
版本中已不建议使用pandas.io.stata.StataReader.data
读取stata
文件,因此您将收到该警告。
Instead, you must use pandas.read_stata
to read the file as shown: 相反,您必须使用pandas.read_stata
读取文件,如下所示:
df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
Sometimes this did not work for me especially when the dataset is large. 有时这对我不起作用,尤其是在数据集很大时。 So the thing I propose here is 2 steps (Stata and Python) 所以我在这里建议的是2个步骤(Stata和Python)
In Stata write the following commands: 在Stata中,编写以下命令:
export excel Cevdet.xlsx, firstrow(variables)
and to copy the variable labels write the following 并复制变量标签,写以下内容
describe, replace
list
export excel using myfile.xlsx, replace first(var)
restore
this will generate for you two files Cevdet.xlsx
and myfile.xlsx
这将为您生成两个文件Cevdet.xlsx
和myfile.xlsx
Now you go to your jupyter notebook 现在您去看Jupyter笔记本
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')
This will allow you to read both files into jupyter (python 3) 这将允许您将两个文件读入jupyter(python 3)
My advice is to save this data file (especially if it is big) 我的建议是保存此数据文件(尤其是大文件时)
df.to_pickle('Cevdet')
The next time you open jupyter you can simply run 下次打开jupyter时,您只需运行
df=pd.read_pickle("Cevdet")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.