[英]How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?
I'm looking to work on a SPSS files (.sav) using pandas
. 我正在寻找使用
pandas
SPSS文件(.sav)。 In the absence of the SPSS program, here's what a typical file looks like when converted to .csv: 在没有SPSS程序的情况下,这是转换为.csv时典型文件的样子:
On investigation into what the first two rows signify (I don't know SPSS), it seems that the first row contains the Label
s, while the second row contains the VarName
s. 在调查前两行的含义(我不知道SPSS)时,似乎第一行包含
Label
s,而第二行包含VarName
。
When I bring the file into pandas thus: 当我将文件带入熊猫时:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
w = com.convert_robj(w)
return w
and then do a head(), the first row (Label) is missing: 然后执行head(),第一行(Label)丢失:
How can labels be maintained? 如何维护标签?
Labels in a sav
file are stored in variable.labels
attribute of the returning object from the read.spss
function. sav
文件中的标签存储在read.spss
函数的返回对象的variable.labels
属性中。
You can get the variable labels with the following: 您可以使用以下内容获取变量标签:
import pandas.rpy.common as com
def get_labels(filename):
w = com.robj.r('attr(foreign::read.spss("%s"), "variable.labels")' % filename)
w = com.convert_robj(w)
return w
If you want to set the labels as the column names of your dataframe: 如果要将标签设置为数据框的列名:
import pandas.rpy.common as com
def savtocsv(filename):
w = com.robj.r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
cols = list(com.robj.r("attr")(w, "variable.labels"))
w = com.convert_robj(w)
w.columns = cols
return w
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.