简体   繁体   English

如何阅读 Python 中的 SPSS aka (.sav)

[英]How to read SPSS aka (.sav) in Python

It's my first time using Jupyter Notebook to analyze survey data (.sav file), and I would like to read it in a way it will show the metadata so I can connect the answers with the questions.这是我第一次使用 Jupyter Notebook 分析调查数据(.sav 文件),我想以一种显示元数据的方式阅读它,以便我可以将答案与问题联系起来。 I'm totally a newbie in this field, so any help is appreciated!我在这个领域完全是新手,所以任何帮助表示赞赏!

import pandas as pd
import pyreadstat
df, meta = pyreadstat.read_sav('./SimData/survey_1.sav')
type(df)
type(meta)
df.head()

Please lmk if there is an additional step needed for me to be able to see the metadata!如果我需要额外的步骤才能看到元数据,请 lmk!

The meta object contains the metadata you are looking for. meta object 包含您要查找的元数据。 Probably the most useful attributes to look at are:最有用的属性可能是:

  • meta.column_names_to_labels: it's a dictionary with column names as you have in your pandas dataframe to labels meaning longer explanations on the meaning of each column meta.column_names_to_labels:这是一个包含列名的字典,就像您在 pandas dataframe 中对标签的含义一样,对每列的含义进行了更长的解释
print(meta.column_names_to_labels)
  • meta.variable_value_labels: a dict where keys are column names and values are a dict where the keys are values you find in your dataframe and values are value labels. meta.variable_value_labels:一个字典,其中键是列名,值是一个字典,其中键是您在 dataframe 中找到的值,值是值标签。
print(meta.variable_value_labels)

For instance if you have a column "gender' with values 1 and 2, you could get: {"gender": {1:"male", 2:"female"}} which means value 1 is male and 2 female. You can get those labels from the beginning if you pass the argument apply_value_formats:例如,如果你有一个值为 1 和 2 的“gender”列,你可以得到:{"gender": {1:"male", 2:"female"}} 这意味着值 1 是男性,2 是女性。你如果传递参数 apply_value_formats,可以从头开始获取这些标签:

df, meta = pyreadstat.read_sav('survey.sav', apply_value_formats=True)

You can also apply those value formats to your dataframe anytime with pyreadstat.set_value_labels which returns a copy of your dataframe with labels:您还可以随时使用 pyreadstat.set_value_labels 将这些值格式应用于您的 dataframe,它会返回带有标签的 dataframe 的副本:

df_copy = pyreadstat.set_value_labels(df, meta)
  • meta.missing_ranges: you get labels for missing values. meta.missing_ranges:你得到缺失值的标签。 Let's say in the survey in certain variable they encoded 1 meaning yes, 2 no and then mussing values, 5 meaning didn't answer, 6 person not at home.假设在某个变量的调查中,他们编码 1 表示是,2 表示否,然后是混淆值,5 表示未回答,6 人不在家。 When you read the dataframe by default you will get values 1 and 2 and NaN (missing) instead of 5 and 6. You can pass the argument user_missing to get 5 and 6, and meta.missing_ranges will tell you that 5 and 6 are missing values.默认情况下,当您读取 dataframe 时,您将获得值 1 和 2 以及 NaN(缺失)而不是 5 和 6。您可以传递参数 user_missing 以获取 5 和 6,meta.missing_ranges 会告诉您缺失 5 和 6值。 Variable_value_labels will give you the "didn't answer" and "person not at home" labels. Variable_value_labels 会给你“没有回答”和“人不在家”的标签。
df, meta = pyreadstat.read_sav("survey.sav", user_missing=True)
print(meta.missing_ranges)
print(meta.variable_value_labels)

These are the potential pieces of information useful for your case, not necessarily all of these pieces will be present in your dataset.这些是对您的案例有用的潜在信息,不一定所有这些信息都会出现在您的数据集中。

More information here: https://ofajardo.github.io/pyreadstat_documentation/_build/html/index.html更多信息在这里: https://ofajardo.github.io/pyreadstat_documentation/_build/html/index.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法从 SPSS (.sav) 文件中读取 Python 中的日期变量 - Can't read date variables in Python from SPSS (.sav) files 如何使用 Python - Pandas 打开或将 SPSS (.sav) 文件转换为 CSV? - How to open or convert SPSS (.sav) file into CSV using Python - Pandas? pd.read_sav 和 pyreadstat 太慢了。 如果必须使用 SAV/SPSS 文件格式,如何为大数据加速 Pandas? - pd.read_sav and pyreadstat are so slow. how can i speed up pandas for big data if i have to use SAV/SPSS file format? 无法在 pandas 模块中使用 python 打开 spss 文件(.sav) - Can't open spss file(.sav) with python in pandas module 我无法使用带有python的spss从.sav文件中删除案例 - I can't delete cases from .sav files using spss with python 在 Python 中读取 SPSS (.sav) 文件时出现“标题已用作名称或标题”错误 - Getting “title already used as a name or title” error while reading SPSS (.sav) file in Python SPSS文件(.sav)通过rpy导入pandas时如何保留标签? - How to preserve Labels when SPSS file (.sav) imported into pandas via rpy? Python 中的分而治之列表(使用 pyreadstat 读取 sav 文件) - Divide and Conquer Lists in Python (to read sav files using pyreadstat) 如何读取发送到 Django POST 请求的 .sav 文件 - How to read a .sav file sent to the Django POST request 在Python中读取.sav文件 - Reading .sav File in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM