简体   繁体   中英

R haven: missing labels and label names when reading spss file

I'm using the haven package for R to read an spss file with user_na=TRUE . The file has many string variables with value labels. In R only the first of the string variables ( SizeofH1 ) has the correct value labels assigned to it as attribute. Unfortunately I cannot not even provide a snippet of this data to make this fully reproducible but here is a screenshot of what I can see in PSPP

PSPP数据编辑器

and what str() in R returns...

 $ SizeofH1:Class 'labelled'  atomic [1:280109] 3 3 3 3 ...
 ..- attr(*, "label")= chr "Size of Household ab 2002"
 ..- attr(*, "format.spss")= chr "A30"
 ..- attr(*, "labels")= Named chr [1:9] "1" "2" "3" "4" ...
 ..- attr(*, "names")= chr [1:9] "4 Persons" "2 Persons" "1 Person 50 years plus" "3 Persons" ...
 $ PROMOTIO: atomic  40 1 40 40 ...
 ..- attr(*, "label")= chr "PROMOTION"
 ..- attr(*, "format.spss")= chr "A30"
 $ inFMCGfr: atomic  1 1 1 1 ...
 ..- attr(*, "label")= chr "in FMCG from2011"
 ..- attr(*, "format.spss")= chr "A30"
 $ TRADESEG: atomic  1 1 1 1 ...
 ..- attr(*, "label")= chr "TRADE SEGMENT"
 ..- attr(*, "format.spss")= chr "A30"
 $ ORGANISA: atomic  111 111 111 111 ...
 ..- attr(*, "label")= chr "ORGANISATION"
 ..- attr(*, "format.spss")= chr "A30"
 $ NAME    : atomic  9 9 9 9 ...
 ..- attr(*, "label")= chr "NAME"
 ..- attr(*, "format.spss")= chr "A30"

I hope someone can point me to any possible reason that causes this behavior.

The "semantics" vignette has some useful information on this topic.

library(haven)
vignette('semantics')

There are a couple of options to get value labels. I think a good one is the example demonstrated below, using the map function from the purrr package (but could be done with lapply instead, too)

# Get data from spss file
df <- read_sav(path_to_file)

# get value labels
df <- map_df(.x = df, .f = function(x) {
  if (class(x) == 'labelled') as_factor(x)
  else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})

最好的方法是将您的spsss文件另存为CSV,然后在R中读取。我以前曾遇到过这种情况,并且某些字符串无法正确读取-通常,对于涉及到字符串变量的SPSS,它不是很聪明。问题。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM