[英]R: How to calculate rates between observations by group?
假設我們有一個如下所示的數據框:
cohort month customers
Jan 01 523
Jan 02 332
Jan 03 221
Jan 04 190
Feb 02 489
Feb 03 310
Feb 04 205
Mar 03 372
Mar 04 192
Apr 04 340
我的目標是創建一個全新的列來存儲每個隊列的保留率。 為此,我需要計算上個月 (04) 中與首次加入每個群組的客戶總數相關的客戶數量。
我正在努力與dplyr
實現兩個表,如下所示:
計算每個群組的當前保留率:
cohort rr
Jan 0.36
Feb 0.42
Mar 0.52
而且,也許是最重要的,另一個可以按月為我提供 RR 演變,如下所示:
cohort month customers period rr
Jan 01 523 0 1
Jan 02 332 1 0.63
Jan 03 221 2 0.42
Jan 04 190 3 0.36
Feb 02 489 0 1
Feb 03 310 1 0.63
Feb 04 205 2 0.42
Mar 03 372 0 1
Mar 04 192 1 0.52
Apr 04 340 0 1
一種dplyr
選項可能是:
df %>%
group_by(cohort) %>%
mutate(period = 1:n() - 1,
rr = customers/first(customers))
cohort month customers period rr
<chr> <int> <int> <dbl> <dbl>
1 Jan 1 523 0 1
2 Jan 2 332 1 0.635
3 Jan 3 221 2 0.423
4 Jan 4 190 3 0.363
5 Feb 2 489 0 1
6 Feb 3 310 1 0.634
7 Feb 4 205 2 0.419
8 Mar 3 372 0 1
9 Mar 4 192 1 0.516
10 Apr 4 340 0 1
對於第二個表:
df %>%
group_by(cohort) %>%
summarise(rr = last(customers)/first(customers))
cohort rr
<chr> <dbl>
1 Apr 1
2 Feb 0.419
3 Jan 0.363
4 Mar 0.516
這是否有效:
df %>% group_by(cohort) %>% summarise(rr = sum(customers[n()])/customers[1])
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
cohort rr
<chr> <dbl>
1 Apr 1
2 Feb 0.419
3 Jan 0.363
4 Mar 0.516
對於第二個,另一個采取:
df %>% group_by(cohort) %>% mutate(period = 0:(n()-1), rr = customers/customers[1])
# A tibble: 10 x 5
# Groups: cohort [4]
cohort month customers period rr
<chr> <chr> <dbl> <int> <dbl>
1 Jan 01 523 0 1
2 Jan 02 332 1 0.635
3 Jan 03 221 2 0.423
4 Jan 04 190 3 0.363
5 Feb 02 489 0 1
6 Feb 03 310 1 0.634
7 Feb 04 205 2 0.419
8 Mar 03 372 0 1
9 Mar 04 192 1 0.516
10 Apr 04 340 0 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.