简体   繁体   中英

How to calculate p.value of each column in a data frame with NA values using shapiro.test in r?

This is what I have tried so far. It works, but it only tells me the p.value of the data that has no NA's. Much of my data has NA values in a few places up to 1/3rd of the data.

normal <- apply(cor_phys, 2, function(x) shapiro.test(x)$p.value)

I want to try adding na.rm to the function, but it's not working. Help?

#calculate the correlations between all variables
corres <- cor_phys %>%                  #cor_phys is my data
  as.matrix %>%
  cor(use="complete.obs") %>%           #complete.obs does not use NA
  as.data.frame %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1)

#removes duplicates correlations
corres <- corres %>%
  mutate(var_order = paste(var1, var2) %>%
         strsplit(split = ' ') %>%
         map_chr( ~ sort(.x) %>% 
         paste(collapse = ' '))) %>%
  mutate(cnt = 1) %>%
  group_by(var_order) %>%
  mutate(cumsum = cumsum(cnt)) %>%
  filter(cumsum != 2) %>%
  ungroup %>%
  select(-var_order, -cnt, -cumsum)        #removes unneeded columns

I did not write this myself, but it is the answer that I used and worked for my needs. The link to the page I used is: How to compute correlations between all columns in R and detect highly correlated variables

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM