简体   繁体   中英

Calculate using dplyr, percentage of NA'S in each column

I have a data frame with some columns with missing values. Is there a way (using dplyr) to efficiently calculate the percentage of each column that is missing ie NA. Sought of like a colSum equivalent. So I dont have to calculate each column percentage missing individually ?

First, I created a test data for you:

a<- c(1,NA,NA,4)
b<- c(NA,2,3,4)
x<- data.frame(a,b)
x
#    a  b
# 1  1 NA
# 2 NA  2
# 3 NA  3
# 4  4  4

Then you can use colMeans(is.na(x)) :

colMeans(is.na(x))
#    a    b 
# 0.50 0.25 

We can use summarise_each

 library(dplyr)
 x %>% 
   summarise_each(funs(100*mean(is.na(.))))

喜欢这种简洁的purrr::map类型:

x %>% map(~ mean(is.na(.)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM