R-按列状态子集，并根据另一个数据框计数唯一记录

Question

I've got a dataset that looks like this: 我有一个看起来像这样的数据集：

customer_id    group_a    group_b    group_c    group_d
123            true       false      true       false
456            false      true       false      true
789            false      true       true       false

I also have each customer's record in a dataset like this. 我在这样的数据集中也有每个客户的记录。

customer_id    date
123            01/01/2019
123            01/02/2019
123            01/03/2019
123            01/04/2019
123            01/04/2019  

456            01/01/2019
456            01/02/2019
456            01/03/2019

789            01/01/2019
789            01/03/2019
789            01/03/2019

I'd like to be able to get the counts of unique records by date for every group iteration where the customer is "true" and the total number of customers for every group . 我希望能够按日期获得客户为“真”的每个组迭代的唯一记录数，以及每个组的客户总数。 The result of which will look like this: 结果如下：

date         group    record   total
01/01/2019   a        1        1
01/02/2019   a        1        1
01/03/2019   a        1        1
01/04/2019   a        1        1

01/01/2019   b        2        2
01/02/2019   b        1        2
01/03/2019   b        2        2
01/04/2019   b        0        2

01/01/2019   c        2        2
01/02/2019   c        1        2
01/03/2019   c        2        2
01/04/2019   c        1        2

01/01/2019   d        1        1
01/02/2019   d        1        1
01/03/2019   d        1        1
01/04/2019   d        0        1

Answer 1

I don't feel this is very elegant, but the result matches your expected output so: Here it is. 我觉得这不是很优雅，但是结果符合您的预期输出，因此：在这里。


library(lubridate)
library(dplyr)
library(tidyr)

df2$date <- mdy(df2$date)

df2 %>% 
  inner_join(df1, by = "customer_id", copy = TRUE) %>%
  gather(key = "group", value = "member", group_a:group_d) %>%
  filter(member == "true") %>% 
  complete(date, group) %>%
  select(date, group, customer_id) ->  df3

df3 %>%
  group_by(group, date) %>% 
  summarise(record = n_distinct(customer_id, na.rm = TRUE)) %>% 
  left_join( df3 %>%
             group_by(group) %>%
             summarise(total = n_distinct(customer_id, na.rm = TRUE)),
             by = "group") %>% ungroup() %>%
  select(date, group, record, total) -> result

which gives: 这使：

# A tibble: 16 x 4
   date       group   record total
   <date>     <chr>    <int> <int>
 1 2019-01-01 group_a      1     1
 2 2019-01-02 group_a      1     1
 3 2019-01-03 group_a      1     1
 4 2019-01-04 group_a      1     1
 5 2019-01-01 group_b      2     2
 6 2019-01-02 group_b      1     2
 7 2019-01-03 group_b      2     2
 8 2019-01-04 group_b      0     2
 9 2019-01-01 group_c      2     2
10 2019-01-02 group_c      1     2
11 2019-01-03 group_c      2     2
12 2019-01-04 group_c      1     2
13 2019-01-01 group_d      1     1
14 2019-01-02 group_d      1     1
15 2019-01-03 group_d      1     1
16 2019-01-04 group_d      0     1

R-按列状态子集，并根据另一个数据框计数唯一记录

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-04-03 06:55:31

R-按列状态子集，并根据另一个数据框计数唯一记录

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-04-03 06:55:31

解决方案1
1 已采纳 2019-04-03 06:55:31