SPSS文件（.sav）通过rpy导入pandas时如何保留标签？

Question

I'm looking to work on a SPSS files (.sav) using pandas . 我正在寻找使用pandas SPSS文件（.sav）。 In the absence of the SPSS program, here's what a typical file looks like when converted to .csv: 在没有SPSS程序的情况下，这是转换为.csv时典型文件的样子：

On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Label s, while the second row contains the VarName s. 在调查前两行的含义（我不知道SPSS）时，似乎第一行包含Label s，而第二行包含VarName 。

When I bring the file into pandas thus: 当我将文件带入熊猫时：

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    w = com.convert_robj(w)
    return w

and then do a head(), the first row (Label) is missing: 然后执行head（），第一行（Label）丢失：

How can labels be maintained? 如何维护标签？

Ref: Is there a Python module to open SPSS files? 参考：是否有一个Python模块来打开SPSS文件？
Python: 2.7.10 Python：2.7.10
Pandas: 0.17.1 熊猫：0.17.1

Answer 1

Labels in a sav file are stored in variable.labels attribute of the returning object from the read.spss function. sav文件中的标签存储在read.spss函数的返回对象的variable.labels属性中。

You can get the variable labels with the following: 您可以使用以下内容获取变量标签：

import pandas.rpy.common as com

def get_labels(filename):
    w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
    w = com.convert_robj(w)
    return w

If you want to set the labels as the column names of your dataframe: 如果要将标签设置为数据框的列名：

import pandas.rpy.common as com

def savtocsv(filename):
    w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
    cols = list(com.robj.r("attr")(w, "variable.labels"))
    w = com.convert_robj(w)
    w.columns = cols
    return w

SPSS文件（.sav）通过rpy导入pandas时如何保留标签？

问题描述

1 个解决方案

解决方案1
6 已采纳 2016-03-29 22:14:01

SPSS文件（.sav）通过rpy导入pandas时如何保留标签？

问题描述

1 个解决方案

解决方案1 6 已采纳 2016-03-29 22:14:01

解决方案1
6 已采纳 2016-03-29 22:14:01