简体   繁体   English

如何计算R中不同日期的唯一ID?

[英]How to count unique IDs at different dates in R?

I am a beginner in R, so I apologise in advance if the question seems dumb or if there is an obvious solution, or if it has already been solved somewhere else...我是 R 的初学者,所以如果问题看起来很愚蠢或者有明显的解决方案,或者已经在其他地方解决了,我提前道歉......

I have a df containing purchases with dates and clients ids provided :我有一个包含购买日期和客户 ID 的 df :


  ANNEE    Date clientID
1  2017 2017-01      aaa
2  2017 2017-01      bbb
3  2018 2018-01      aaa
4  2018 2018-02      aaa
5  2018 2018-01      bbb
6  2019 2019-01      aaa
7  2019 2019-01      ccc
8  2020 2020-01      ddd
9  2020 2020-01      ccc

I would like to know for each year what percentage of my clients were present in my df the previous year.我想知道每年我的客户中有多少百分比在前一年出现在我的 df 中。 In this example, that would look like :在这个例子中,它看起来像:

dfObjective
Date   Prop
2017     0
2018     1
2019   0.5
2020   0.5

I thought the first move would be to rearrange my df to count the number of clients present in one given year, regardless of how many purchases they made, and I have done it (though I'm sure there is a better way to do it)我认为第一步是重新安排我的 df 以计算给定年份中存在的客户数量,无论他们购买了多少,并且我已经做到了(尽管我确信有更好的方法来做到这一点) )

library(plyr)
clients = ddply(df, "ANNEE", summarise, Count = length(unique(ClientID)))

df2
ANNEE Count
2017     2
2018     2
2019     2
2020     2

However I can't find how to count the proportion of clients that already made at least one purchase the previous year...但是我找不到如何计算上一年已经至少购买一次的客户比例......

Thank you in advance !先感谢您 !

Here is a tidyverse solution.这是一个tidyverse解决方案。

First, group by clientId to determine which clients were in the previous year.首先,按clientId分组以确定哪些客户在上一年。 Then, group by year to find the proportions.然后,按年份分组以找到比例。

library(tidyverse)

df <- read_table2("
 ANNEE    Date clientID
 2017 2017-01      aaa
 2017 2017-01      bbb
 2018 2018-01      aaa
 2018 2018-02      aaa
 2018 2018-01      bbb
 2019 2019-01      aaa
 2019 2019-01      ccc
 2020 2020-01      ddd
 2020 2020-01      ccc
")

df %>%
  distinct(clientID, ANNEE) %>%
  group_by(clientID) %>%
  mutate(in_previous_year = (ANNEE - 1) %in% ANNEE) %>%
  group_by(ANNEE) %>%
  summarise(Prop = sum(in_previous_year) / n())
#> # A tibble: 4 x 2
#>   ANNEE  Prop
#>   <dbl> <dbl>
#> 1  2017   0  
#> 2  2018   1  
#> 3  2019   0.5
#> 4  2020   0.5

Base R :基础 R :

data.frame(ANNEE = unique(df$ANNEE), prop =
             rowMeans(apply(do.call(
               rbind, lapply(with(df[order(df$ANNEE), ],
                                  split(clientID, ANNEE)),
                             unique)
             ), 2, duplicated)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算R中不同唯一ID的值出现的次数? - How to calculate the number of times a value occurs for different unique IDs in R? 如何在R中的个人中为不同的观察日期添加独特的场合? - How to add a unique occasion for different observation dates within an individual in R? 根据匹配日期查找唯一 ID,data.table r - Find unique IDs based on matching dates, data.table r 如何将两个不同的 ID 协调为一个,然后应用于具有两个 ID 的 df,但在 R 中只计算一次主题? - How to reconcile two different IDs as one, then apply to a df with both IDs but count the subject only once in R? 以更快的方式计算R中列中不同ID的特征 - Count features for different ids in columns in R in faster way 如何使用 R 将 ID 分成不同的行 - How to separate IDs into different rows using R 如何将 ID 与 R 中不同数据框中的名称匹配 - How to match IDs with names in a different dataframe in R 不同的data.table r结果有或没有引用/如何使用uniqueN计算唯一值 - Different data.table r results with or without quotations / How to count unique values with uniqueN 通过R中的唯一ID对表进行分组 - Group a table by unique IDs in R 如何通过唯一 ID 将 R 中的列中的某些行加在一起? - How to add together certain rows within a column in R by unique IDs?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM