如何通过熊猫导入.dta并描述数据？

Question

我是python的新手，有一个简单的问题。 第一步，我想加载我在Stata中创建的一些示例数据。 第二步，我想用python描述数据-也就是说，我想要一个导入变量名的列表。 到目前为止，我已经做到了：

from pandas.io.stata import StataReader

reader = StataReader('sample_data.dta')
data = reader.data()

dir()

我收到以下错误：

anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
  warnings.warn("'data' is deprecated, use 'read' instead")

这是什么意思，我该如何解决？ 而且， dir()是了解我在数据中具有哪些变量的正确方法吗？

Answer 1

在pandas 0.18.1版本中已不建议使用pandas.io.stata.StataReader.data读取stata文件，因此您将收到该警告。

相反，您必须使用pandas.read_stata读取文件，如下所示：

df = pd.read_stata('sample_data.dta')
df.dtypes                                        ## Return the dtypes in this object

Answer 2

有时这对我不起作用，尤其是在数据集很大时。 所以我在这里建议的是2个步骤（Stata和Python）

在Stata中，编写以下命令：

export excel Cevdet.xlsx, firstrow(variables)

并复制变量标签，写以下内容

describe, replace
    list
    export excel using myfile.xlsx, replace first(var)
restore

这将为您生成两个文件Cevdet.xlsx和myfile.xlsx

现在您去看Jupyter笔记本

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')

这将允许您将两个文件读入jupyter（python 3）

我的建议是保存此数据文件（尤其是大文件时）

df.to_pickle('Cevdet')

下次打开jupyter时，您只需运行

df=pd.read_pickle("Cevdet")

如何通过熊猫导入.dta并描述数据？

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-08-21 14:18:16

解决方案2
0 2019-03-31 15:03:17

如何通过熊猫导入.dta并描述数据？

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-08-21 14:18:16

解决方案2 0 2019-03-31 15:03:17

解决方案1
1 已采纳 2016-08-21 14:18:16

解决方案2
0 2019-03-31 15:03:17