简体   繁体   中英

Dealing with Missing Values for one Variable in R

I'm currently dealing with a data set that has missing values, but they are only missing for one single variable. I was trying to determine whether they are missing at random, so that I can simply remove them from the data frame. Hence, I am trying to find potential correlations between the NA's in the data frame and the values of the other variables. I found the following code online:

library("VIM")
data(sleep)
x <- as.data.frame(abs(is.na(sleep)))
head(sleep)
head(x)
y <- x[which(sapply(x, sd) > 0)]
cor(y)

However, this only shows you how the missing values themselves are correlated, in case there are distributed across all variables.

Is there a way to find not the correlation between the missing values in a data frame, but the correlation between the missing values of one variable and values of another variable? For example, if you have a survey which is optionally asking for family income, how could you determine whether the missing values are eg correlated with low income with R?

library(finalfit)
library(dplyr)

df <- data.frame(
  A = c(1,2,4,5),
  B = c(55,44,3,6),
  C = c(NA, 4, NA, 5)
)

df %>%
  missing_pairs("A", "C")

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM