简体   繁体   中英

To create a frequency table with dplyr to count the factor levels and missing values and report it

Some questions are similar to this topic ( here or here , as an example) and I know one solution that works, but I want a more elegant response.

I work in epidemiology and I have variables 1 and 0 (or NA). Example: Does patient has cancer? NA or 0 is no

1 is yes

Let's say I have several variables in my dataset and I want to count only variables with "1". Its a classical frequency table, but dplyr are turning things more complicated than I could imagine at the first glance.

My code is working:

dataset %>%
  select(VISimpair, HEARimpai, IntDis, PhyDis, EmBehDis, LearnDis, 
         ComDis, ASD, HealthImpair, DevDelays) %>%  # replace to your needs
  summarise_all(funs(sum(1-is.na(.))))

And you can reproduce this code here:

library(tidyverse)
dataset <- data.frame(var1 = rep(c(NA,1),100), var2=rep(c(NA,1),100))

dataset %>% select(var1, var2) %>% summarise_all(funs(sum(1-is.na(.))))

But I really want to select all variables I want, count how many 0 (or NA) I have and how many 1 I have and report it and have this output 期望的输出

Thanks.

What about the following frequency table per variable?

First, I edit your sample data to also include 0's and load the necessary libraries.

library(tidyr)
library(dplyr)
dataset <- data.frame(var1 = rep(c(NA,1,0),100), var2=rep(c(NA,1,0),100))

Second, I convert the data using gather to make it easier to group_by later for the frequency table created by count , as mentioned by CPak.

dataset %>%
    select(var1, var2) %>%
    gather(var, val) %>%
    mutate(val = factor(val)) %>%
    group_by(var, val) %>%
    count()

# A tibble: 6 x 3
# Groups:   var, val [6]
  var   val       n
  <chr> <fct> <int>
1 var1  0       100
2 var1  1       100
3 var1  NA      100
4 var2  0       100
5 var2  1       100
6 var2  NA      100

A quick and dirty method to do this is to coerce your input into factors:

dataset$var1 = as.factor(dataset$var1) dataset$var2 = as.factor(dataset$var2) summary(dataset$var1) summary(dataset$var2) Summary tells you number of occurrences of each levels of factor.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM