简体   繁体   中英

How can I save grouped counts across factor levels into a new variable with dplyr?

i try to save grouped counts of various factor levels into a new variable:

Lets say my data look like this:


a <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
b <- c("acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con","acc", "rej", "con", "acc", "rej")

df <- data.frame(a,b) 

The resulting data frame should look like this:


a <- c(1,2,3,4)
number_acc <- c(2,2,1,2)
number_rej <- c(2,1,2,2)
number_con <- c(1,2,2,1)

I tried to solve the problem in the following way:


df2 <- df %>%
  group_by(a) %>% 
  mutate(number_acc = count(b == 'acc'), 
         number_rej = count(b == 'rej'),
         number_con = count(b == 'con'))

However, i get an error message that the method "count" cannot be applied to objects of the class "logical".

Thank you for your help!

Use the tabyl function from the janitor package:

Your data:

a <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4)
b <- c("acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con", "acc", "rej", "con","acc", "rej", "con", "acc", "rej")

df <- data.frame(a,b)

Summarize grouping by count:

library(janitor)
data_summary <- tabyl(df, a, b)
data_summary

# a acc con rej
# 1   2   1   2
# 2   2   2   1
# 3   1   2   2
# 4   2   1   2

Here is an alternative way: We could use pivot_wider with names_glue after count :

library(tidyr)
library(dplyr)

df %>% 
  count(a,b) %>% 
  pivot_wider(
    names_from = b,
    values_from = n,
    names_glue = "{b}_{'number'}"
  )
      a acc_number con_number rej_number
  <dbl>      <int>      <int>      <int>
1     1          2          1          2
2     2          2          2          1
3     3          1          2          2
4     4          2          1          2

To make the existing code to work, we need to summarise instead of mutate , and sum instead of count :

df %>%
  group_by(a) %>% 
  summarise(number_acc = sum(b == 'acc'), 
            number_rej = sum(b == 'rej'),
            number_con = sum(b == 'con'))

# # A tibble: 4 x 4
#       a number_acc number_rej number_con
#   <dbl>      <int>      <int>      <int>
# 1     1          2          2          1
# 2     2          2          1          2
# 3     3          1          2          2
# 4     4          2          2          1

But there are better ways of doing this, for example see answers at:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM