简体   繁体   中英

correctly treating NA values in SPSS .sav file imported into R using R's Haven package

My platform is Windows 10

The data in my .sav file looks like this (the screenshots are from PSPP not SPSS ):

Data View:

在此处输入图像描述

Variable View: 在此处输入图像描述

I'm using haven to import the .sav file into R :

library("tidyverse")
library("haven")

haven commands (my .sav filename is spss_missing99.sav ):

> spss2 <- read_sav("C:/.../spss_missing99.sav")
> spss2

# A tibble: 11 x 1
   Points
    <dbl>
 1      1
 2      2
 3      3
 4      4
 5      5
 6      6
 7      7
 8      8
 9      9
10     10
11     NA


> is.na(spss2)

      Points
 [1,]  FALSE
 [2,]  FALSE
 [3,]  FALSE
 [4,]  FALSE
 [5,]  FALSE
 [6,]  FALSE
 [7,]  FALSE
 [8,]  FALSE
 [9,]  FALSE
[10,]  FALSE
[11,]   TRUE

> mean(spss2)

[1] NA
Warning message:
In mean.default(spss2) : argument is not numeric or logical: returning NA


> mean(spss2, na.rm = TRUE)

[1] NA
Warning message:
In mean.default(spss2, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

My question: why won't the last 2 mean commands work?

Thanks.

Because you are passing a dataframe/tibble to mean function, mean function works with a vector.

mean(spss2$Points, na.rm = TRUE)
#[1] 5.5

You can pass dataframe to colMeans function which will return column-wise mean of all the columns in the dataframe.

colMeans(spss2, na.rm = TRUE)

#Points 
#   5.5 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM