[英]Compute relative frequencies with group totals using dplyr
I have the following toy data: 我有以下玩具数据:
data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A",
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA,
-16L))
Using the commands: 使用命令:
data <- table(data$class, data$value)
data <- as.data.frame(data)
data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)$Freq
I calculate appropriate relative frequencies for each value in each of the classes: 我为每个类中的每个值计算适当的相对频率:
> data
Var1 Var2 Freq rel_freq
1 A 1 3 0.2727273
2 B 1 3 0.6000000
3 A 2 4 0.3636364
4 B 2 2 0.4000000
5 A 3 4 0.3636364
6 B 3 0 0.0000000
I wonder how to construct equivalent dplyr
pipeline. 我想知道如何构造等效的
dplyr
管道。 Pasted below is my attempt: 粘贴在下面是我的尝试:
library(dplyr)
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
mutate(freq = n / sum(n))
I compute relative frequencies for each value, but, unfortunately, separately for each pair of classes (instead for group totals): 我计算每个值的相对频率,但不幸的是,每个类的相对频率(而不是组总数):
Source: local data frame [6 x 4]
Groups: value [3]
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.5000000
2 1 B 3 0.5000000
3 2 A 4 0.6666667
4 2 B 2 0.3333333
5 3 A 4 1.0000000
6 3 B 0 0.0000000
You only need to group by class
for computing the frequencies, so remove the value
grouping: 您只需要按
class
分组来计算频率,因此请删除value
分组:
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
group_by(class) %>%
mutate(freq = n / sum(n))
# A tibble: 6 x 4
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.2727273
2 1 B 3 0.6000000
3 2 A 4 0.3636364
4 2 B 2 0.4000000
5 3 A 4 0.3636364
6 3 B 0 0.0000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.