[英]R - Subset by column status and count unique records against another dataframe
I've got a dataset that looks like this: 我有一个看起来像这样的数据集:
customer_id group_a group_b group_c group_d
123 true false true false
456 false true false true
789 false true true false
I also have each customer's record in a dataset like this. 我在这样的数据集中也有每个客户的记录。
customer_id date
123 01/01/2019
123 01/02/2019
123 01/03/2019
123 01/04/2019
123 01/04/2019
456 01/01/2019
456 01/02/2019
456 01/03/2019
789 01/01/2019
789 01/03/2019
789 01/03/2019
I'd like to be able to get the counts of unique records by date for every group iteration where the customer is "true" and the total number of customers for every group . 我希望能够按日期获得客户为“真”的每个组迭代的唯一记录数,以及每个组的客户总数 。 The result of which will look like this: 结果如下:
date group record total
01/01/2019 a 1 1
01/02/2019 a 1 1
01/03/2019 a 1 1
01/04/2019 a 1 1
01/01/2019 b 2 2
01/02/2019 b 1 2
01/03/2019 b 2 2
01/04/2019 b 0 2
01/01/2019 c 2 2
01/02/2019 c 1 2
01/03/2019 c 2 2
01/04/2019 c 1 2
01/01/2019 d 1 1
01/02/2019 d 1 1
01/03/2019 d 1 1
01/04/2019 d 0 1
I don't feel this is very elegant, but the result matches your expected output so: Here it is. 我觉得这不是很优雅,但是结果符合您的预期输出,因此:在这里。
library(lubridate)
library(dplyr)
library(tidyr)
df2$date <- mdy(df2$date)
df2 %>%
inner_join(df1, by = "customer_id", copy = TRUE) %>%
gather(key = "group", value = "member", group_a:group_d) %>%
filter(member == "true") %>%
complete(date, group) %>%
select(date, group, customer_id) -> df3
df3 %>%
group_by(group, date) %>%
summarise(record = n_distinct(customer_id, na.rm = TRUE)) %>%
left_join( df3 %>%
group_by(group) %>%
summarise(total = n_distinct(customer_id, na.rm = TRUE)),
by = "group") %>% ungroup() %>%
select(date, group, record, total) -> result
which gives: 这使:
# A tibble: 16 x 4
date group record total
<date> <chr> <int> <int>
1 2019-01-01 group_a 1 1
2 2019-01-02 group_a 1 1
3 2019-01-03 group_a 1 1
4 2019-01-04 group_a 1 1
5 2019-01-01 group_b 2 2
6 2019-01-02 group_b 1 2
7 2019-01-03 group_b 2 2
8 2019-01-04 group_b 0 2
9 2019-01-01 group_c 2 2
10 2019-01-02 group_c 1 2
11 2019-01-03 group_c 2 2
12 2019-01-04 group_c 1 2
13 2019-01-01 group_d 1 1
14 2019-01-02 group_d 1 1
15 2019-01-03 group_d 1 1
16 2019-01-04 group_d 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.