简体   繁体   English

计算特定时间段内的发生次数

[英]Calculate number of occurrences within a specific time period

I have the folllowing data, where ID stands for an individual, Date for the date, and Purchased for whether somebody made a purchase (I made this last one so that I can count the the occurences):我有以下数据,其中 ID 代表个人,日期代表日期,购买代表是否有人购买(我做了最后一个,以便计算发生次数):

   ID       Date Purchased
1   1 2017-01-01         1
2   1 2017-08-03         1
3   1 2017-09-02         1
4   2 2017-09-04         1
5   2 2018-07-12         1
6   2 2018-11-03         1
7   2 2018-12-05         1
8   2 2019-01-01         1
9   3 2018-02-03         1
10  3 2020-02-03         1
11  3 2020-03-01         1

I would like to create a variable called "Frequency" that calculates the number of times an individual has made a purchase in the past year by summing up all the "Purchased" before the specific Date you see in the data frame.我想创建一个名为“Frequency”的变量,通过汇总您在数据框中看到的特定日期之前的所有“Purchased”来计算个人在过去一年中购买的次数。

So for example, for row 3 this would lead to a "Frequency" of 2 since 2017-01-01 and 2017-08-03 are both within a one-year time period from 2017-09-02 (so within the interval of 2016-09-02 and 2017-09-01 ).因此,例如,对于第 3 行,这将导致“频率”为 2,因为2017-01-012017-08-03都在2017-09-02的一年时间段内(因此在2016-09-02年 9 月 2 日和2017-09-01年 9 月 1 日)。
See desired output:请参阅所需的 output:

   ID       Date Purchased Frequency
1   1 2017-01-01         1         0
2   1 2017-08-03         1         1
3   1 2017-09-02         1         2
4   2 2017-09-04         1         0
5   2 2018-07-12         1         1
6   2 2018-11-03         1         1
7   2 2018-12-05         1         2
8   2 2019-01-01         1         3
9   3 2018-02-03         1         0
10  3 2020-02-03         1         0
11  3 2020-03-01         1         1

To reproduce the dataframe:要重现 dataframe:

df <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3), Date = as.Date(c('2017-01-01', '2017-08-03', '2017-09-02', '2017-09-04', '2018-07-12', '2018-11-03', '2018-12-05', '2019-01-01', '2018-02-03', '2020-02-03', '2020-03-01')), Purchased = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ))

I've searched on stackoverlow but haven't been able to find an answer yet that I'm able to apply to my situation and obtain the desired results.我已经在 stackoverlow 上进行了搜索,但还没有找到可以应用于我的情况并获得所需结果的答案。 One of the things that I found and tried was this:我发现并尝试过的其中一件事是:

df$frequency <-
sapply(df$Date, function(x){
sum(df$Date < x & df$Date >= x - 365)
})

I believe this might give me the results I want if I can find a way to include that it groups by ID (so it sums per ID and not overall).我相信这可能会给我我想要的结果,如果我能找到一种方法来包含它按 ID 分组(所以它是每个 ID 的总和而不是整体)。 Can't say for sure of course since I haven't been able to test it out.当然不能肯定地说,因为我无法测试它。 Any help is much appreciated.任何帮助深表感谢。

Here's a tidyverse solution:这是一个tidyverse解决方案:

library(dplyr)
library(purrr)
library(lubridate)

df %>%
  group_by(ID) %>%
  mutate(Frequency = map_dbl(Date, 
                     ~sum(Purchased[between(Date, .x - years(1), .x - 1)]))) %>%
  ungroup

#      ID Date       Purchased Frequency
#   <dbl> <date>         <dbl>     <dbl>
# 1     1 2017-01-01         1         0
# 2     1 2017-08-03         1         1
# 3     1 2017-09-02         1         2
# 4     2 2017-09-04         1         0
# 5     2 2018-07-12         1         1
# 6     2 2018-11-03         1         1
# 7     2 2018-12-05         1         2
# 8     2 2019-01-01         1         3
# 9     3 2018-02-03         1         0
#10     3 2020-02-03         1         0
#11     3 2020-03-01         1         1

The logic of the code is for every Date in each ID it sum s the Purchased value between current date - 1 year and current date - 1 day.代码的逻辑是对于每个ID中的每个Date ,它sum当前日期 - 1 年和当前日期 - 1 天之间的已Purchased值。

You could use non-equi joins with data.table :您可以将非 equi 连接与data.table一起使用:

library(data.table)

setDT(df)
df[,c("Date","Before"):=.(as.Date(Date),as.Date(Date)-365)]
df[df,.(ID, Date),on=.(ID=ID, Date>=Before, Date<=Date)][,.N-1,by=.(ID,Date)]

   ID       Date V1
 1:  1 2017-01-01  0
 2:  1 2017-08-03  1
 3:  1 2017-09-02  2
 4:  2 2017-09-04  0
 5:  2 2018-07-12  1
 6:  2 2018-11-03  1
 7:  2 2018-12-05  2
 8:  2 2019-01-01  3
 9:  3 2018-02-03  0
10:  3 2020-02-03  0
11:  3 2020-03-01  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:计算 R 数据集中每个独特个体在过去特定时间段内的出现次数 - R: Calculating the number of occurrences within a specific time period in the past for each unique individual in a dataset in R 计算另一个事件的时间段内事件发生的次数 - Counting number of event occurrences within time period of another event R:计算未来指定时间内特定事件的发生次数 - R: calculate the number of occurrences of a specific event in a specified time future 如何计算特定时间段内的事件数 - How to calculate number of events during specific time period 在特定时间段内使用R计算时间序列数据的出现次数的有效方法 - Effective ways in Counting Number of Occurrences for a time series data using R in a specific time period 计算给定时间段内的时间间隔的持续时间 - Calculate duration of a time interval within a given period 使用分组计算过去和未来特定事件的发生次数 - Calculate the number of occurrences of a specific event in the past AND future with groupings 使用 R 计算特定时期内每个可能的时间范围 - Calculate every possible time range in a specific period using R R 统计dataframe的每一列中特定值出现的次数 - R count the number of occurrences of a specific value within each column of dataframe R-群集y个时间段内的事件数 - R - Cluster x number of events within y time period
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM