如何计算R中不同日期的唯一ID？

Question

I am a beginner in R, so I apologise in advance if the question seems dumb or if there is an obvious solution, or if it has already been solved somewhere else...我是 R 的初学者，所以如果问题看起来很愚蠢或者有明显的解决方案，或者已经在其他地方解决了，我提前道歉......

I have a df containing purchases with dates and clients ids provided :我有一个包含购买日期和客户 ID 的 df ：


  ANNEE    Date clientID
1  2017 2017-01      aaa
2  2017 2017-01      bbb
3  2018 2018-01      aaa
4  2018 2018-02      aaa
5  2018 2018-01      bbb
6  2019 2019-01      aaa
7  2019 2019-01      ccc
8  2020 2020-01      ddd
9  2020 2020-01      ccc

I would like to know for each year what percentage of my clients were present in my df the previous year.我想知道每年我的客户中有多少百分比在前一年出现在我的 df 中。 In this example, that would look like :在这个例子中，它看起来像：

dfObjective
Date   Prop
2017     0
2018     1
2019   0.5
2020   0.5

I thought the first move would be to rearrange my df to count the number of clients present in one given year, regardless of how many purchases they made, and I have done it (though I'm sure there is a better way to do it)我认为第一步是重新安排我的 df 以计算给定年份中存在的客户数量，无论他们购买了多少，并且我已经做到了（尽管我确信有更好的方法来做到这一点） )

library(plyr)
clients = ddply(df, "ANNEE", summarise, Count = length(unique(ClientID)))

df2
ANNEE Count
2017     2
2018     2
2019     2
2020     2

However I can't find how to count the proportion of clients that already made at least one purchase the previous year...但是我找不到如何计算上一年已经至少购买一次的客户比例......

Thank you in advance !先感谢您！

Answer 1

Here is a tidyverse solution.这是一个tidyverse解决方案。

First, group by clientId to determine which clients were in the previous year.首先，按clientId分组以确定哪些客户在上一年。 Then, group by year to find the proportions.然后，按年份分组以找到比例。

library(tidyverse)

df <- read_table2("
 ANNEE    Date clientID
 2017 2017-01      aaa
 2017 2017-01      bbb
 2018 2018-01      aaa
 2018 2018-02      aaa
 2018 2018-01      bbb
 2019 2019-01      aaa
 2019 2019-01      ccc
 2020 2020-01      ddd
 2020 2020-01      ccc
")

df %>%
  distinct(clientID, ANNEE) %>%
  group_by(clientID) %>%
  mutate(in_previous_year = (ANNEE - 1) %in% ANNEE) %>%
  group_by(ANNEE) %>%
  summarise(Prop = sum(in_previous_year) / n())
#> # A tibble: 4 x 2
#>   ANNEE  Prop
#>   <dbl> <dbl>
#> 1  2017   0  
#> 2  2018   1  
#> 3  2019   0.5
#> 4  2020   0.5

Answer 2

Base R :基础 R :

data.frame(ANNEE = unique(df$ANNEE), prop =
             rowMeans(apply(do.call(
               rbind, lapply(with(df[order(df$ANNEE), ],
                                  split(clientID, ANNEE)),
                             unique)
             ), 2, duplicated)))

如何计算R中不同日期的唯一ID？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-11-16 12:38:02

解决方案2
0 2020-11-16 13:15:54

如何计算R中不同日期的唯一ID？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-11-16 12:38:02

解决方案2 0 2020-11-16 13:15:54

解决方案1
2 已采纳 2020-11-16 12:38:02

解决方案2
0 2020-11-16 13:15:54