简体   繁体   中英

Compute relative frequencies with group totals using dplyr

I have the following toy data:

data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", 
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, 
-16L))

Using the commands:

data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq

I calculate appropriate relative frequencies for each value in each of the classes:

> data
  Var1 Var2 Freq  rel_freq
1    A    1    3 0.2727273
2    B    1    3 0.6000000
3    A    2    4 0.3636364
4    B    2    2 0.4000000
5    A    3    4 0.3636364
6    B    3    0 0.0000000

I wonder how to construct equivalent dplyr pipeline. Pasted below is my attempt:

library(dplyr)
data %>%
  group_by(value, class) %>%
  summarise(n = n()) %>%
  complete(class, fill = list(n = 0)) %>%
  mutate(freq = n / sum(n))

I compute relative frequencies for each value, but, unfortunately, separately for each pair of classes (instead for group totals):

Source: local data frame [6 x 4]
Groups: value [3]

  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.5000000
2     1      B     3 0.5000000
3     2      A     4 0.6666667
4     2      B     2 0.3333333
5     3      A     4 1.0000000
6     3      B     0 0.0000000

You only need to group by class for computing the frequencies, so remove the value grouping:

data %>%
    group_by(value, class) %>%
    summarise(n = n()) %>%
    complete(class, fill = list(n = 0)) %>%
    group_by(class) %>%
    mutate(freq = n / sum(n))
# A tibble: 6 x 4
  value  class     n      freq
  <int> <fctr> <dbl>     <dbl>
1     1      A     3 0.2727273
2     1      B     3 0.6000000
3     2      A     4 0.3636364
4     2      B     2 0.4000000
5     3      A     4 0.3636364
6     3      B     0 0.0000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM