[英]Cumulative Count of Members by Month DPLYR
我有一個按注冊月份列出的成員列表,我想做的是創建一個按月列出成員總數的數據框。
原始數據
month.list <- structure(c(18444, 18687, 18475, 18506, 18536, 18567, 18597,18718, 18659, 18628, 18779, 18748), class = "Date")
total.membership.working <- structure(list(`Mem Account` = c(26137295, 26139796, 26400007,26400455, 26402031, 26402078, 26402239, 1092287142, 1092295228,1092473120), Month = structure(c(18444, 18687, 18444, 18444,18475, 18475, 18444, 18779, 18779, 18779), class = "Date")), row.names = c(NA,-10L), groups = structure(list(`Mem Account` = c(26137295, 26139796,26400007, 26400455, 26402031, 26402078, 26402239, 1092287142,1092295228, 1092473120), .rows = structure(list(1L, 2L, 3L, 4L,5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of","vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df","tbl", "data.frame"), .drop = TRUE), class = c("grouped_df","tbl_df", "tbl", "data.frame"))
我已經寫了一個 for 循環來完成這個,但我希望找到一種沒有循環的整潔方法。
For循環
total.membership <- data.frame()
for(i in 1:length(month.list)) {
foo <- total.membership.working %>%
ungroup() %>%
filter(Month <= month.list[i]) %>%
summarise(Month = max(Month),
total_membership = n_distinct(`Mem Account`))
total.membership <- total.membership %>%
bind_rows(foo)
}
期望輸出
total.membership <- structure(list(Month = structure(c(18444, 18687, 18475, 18506,18536, 18567, 18597, 18628, 18779, 18748), class = "Date"), total_membership = c(45886L,58128L, 47878L, 49214L, 51119L, 53390L, 55200L, 56299L, 60503L,59583L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 10L, 11L,12L), class = "data.frame")
> total.membership
Month total_membership
1 2020-07-01 45886
2 2021-03-01 58128
3 2020-08-01 47878
4 2020-09-01 49214
5 2020-10-01 51119
6 2020-11-01 53390
7 2020-12-01 55200
8 2021-04-01 58902
9 2021-02-01 57238
10 2021-01-01 56299
11 2021-06-01 60503
12 2021-05-01 59583
試試這個代碼來計算每個月的累積唯一賬戶。
library(dplyr)
total.membership.working %>%
ungroup %>%
arrange(Month) %>%
mutate(cum_n = cumsum(!duplicated(`Mem Account`))) %>%
group_by(Month) %>%
summarise(cum_unique_entries = max(cum_n))
# Month cum_unique_entries
# <date> <int>
#1 2020-07-01 4
#2 2020-08-01 6
#3 2021-03-01 7
#4 2021-06-01 10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.