[英]R: How to calculate rates between observations by group?
假设我们有一个如下所示的数据框:
cohort month customers
Jan 01 523
Jan 02 332
Jan 03 221
Jan 04 190
Feb 02 489
Feb 03 310
Feb 04 205
Mar 03 372
Mar 04 192
Apr 04 340
我的目标是创建一个全新的列来存储每个队列的保留率。 为此,我需要计算上个月 (04) 中与首次加入每个群组的客户总数相关的客户数量。
我正在努力与dplyr
实现两个表,如下所示:
计算每个群组的当前保留率:
cohort rr
Jan 0.36
Feb 0.42
Mar 0.52
而且,也许是最重要的,另一个可以按月为我提供 RR 演变,如下所示:
cohort month customers period rr
Jan 01 523 0 1
Jan 02 332 1 0.63
Jan 03 221 2 0.42
Jan 04 190 3 0.36
Feb 02 489 0 1
Feb 03 310 1 0.63
Feb 04 205 2 0.42
Mar 03 372 0 1
Mar 04 192 1 0.52
Apr 04 340 0 1
一种dplyr
选项可能是:
df %>%
group_by(cohort) %>%
mutate(period = 1:n() - 1,
rr = customers/first(customers))
cohort month customers period rr
<chr> <int> <int> <dbl> <dbl>
1 Jan 1 523 0 1
2 Jan 2 332 1 0.635
3 Jan 3 221 2 0.423
4 Jan 4 190 3 0.363
5 Feb 2 489 0 1
6 Feb 3 310 1 0.634
7 Feb 4 205 2 0.419
8 Mar 3 372 0 1
9 Mar 4 192 1 0.516
10 Apr 4 340 0 1
对于第二个表:
df %>%
group_by(cohort) %>%
summarise(rr = last(customers)/first(customers))
cohort rr
<chr> <dbl>
1 Apr 1
2 Feb 0.419
3 Jan 0.363
4 Mar 0.516
这是否有效:
df %>% group_by(cohort) %>% summarise(rr = sum(customers[n()])/customers[1])
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
cohort rr
<chr> <dbl>
1 Apr 1
2 Feb 0.419
3 Jan 0.363
4 Mar 0.516
对于第二个,另一个采取:
df %>% group_by(cohort) %>% mutate(period = 0:(n()-1), rr = customers/customers[1])
# A tibble: 10 x 5
# Groups: cohort [4]
cohort month customers period rr
<chr> <chr> <dbl> <int> <dbl>
1 Jan 01 523 0 1
2 Jan 02 332 1 0.635
3 Jan 03 221 2 0.423
4 Jan 04 190 3 0.363
5 Feb 02 489 0 1
6 Feb 03 310 1 0.634
7 Feb 04 205 2 0.419
8 Mar 03 372 0 1
9 Mar 04 192 1 0.516
10 Apr 04 340 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.