Dealing with Missing Values for one Variable in R

Question

I'm currently dealing with a data set that has missing values, but they are only missing for one single variable. I was trying to determine whether they are missing at random, so that I can simply remove them from the data frame. Hence, I am trying to find potential correlations between the NA's in the data frame and the values of the other variables. I found the following code online:

library("VIM")
data(sleep)
x <- as.data.frame(abs(is.na(sleep)))
head(sleep)
head(x)
y <- x[which(sapply(x, sd) > 0)]
cor(y)

However, this only shows you how the missing values themselves are correlated, in case there are distributed across all variables.

Is there a way to find not the correlation between the missing values in a data frame, but the correlation between the missing values of one variable and values of another variable? For example, if you have a survey which is optionally asking for family income, how could you determine whether the missing values are eg correlated with low income with R?

Answer 1

library(finalfit)
library(dplyr)

df <- data.frame(
  A = c(1,2,4,5),
  B = c(55,44,3,6),
  C = c(NA, 4, NA, 5)
)

df %>%
  missing_pairs("A", "C")

Dealing with Missing Values for one Variable in R

Question

1 answers

solution1
1 ACCPTED 2021-05-21 08:19:48

Dealing with Missing Values for one Variable in R

Question

1 answers

solution1 1 ACCPTED 2021-05-21 08:19:48

solution1
1 ACCPTED 2021-05-21 08:19:48