简体   繁体   English

计算日期之间的实例数

[英]Counting the number of instances between dates

Suppose that I have the following dataset:假设我有以下数据集:

library(data.table)
library(lubridate)

store_DT <- data.table(date = seq.Date(from = as.Date("2019-10-01"),
                                       to = as.Date("2019-10-05"),
                                       by = "day"),
                       store = c(rep("A",5),rep("B",5)))

    date         store
 1: 2019-10-01     A
 2: 2019-10-02     A
 3: 2019-10-03     A
 4: 2019-10-04     A
 5: 2019-10-05     A
 6: 2019-10-01     B
 7: 2019-10-02     B
 8: 2019-10-03     B
 9: 2019-10-04     B
10: 2019-10-05     B


which is simply a data.table of store x date observations.这只是一个存储 x 日期观察的 data.table。

Suppose I have another data.table of employee start and end times (inclusive):假设我有另一个员工开始和结束时间(包括)的 data.table:

roster_DT <- data.table(
  store = c("A", "A", "A", "A", "B", "B","B", "B"),
  employee_ID = 1:8,
  start_date = c("2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04",
                 "2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04"),
  end_date = c("2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06",
               "2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06")
)

   store employee_ID start_date   end_date
1:     A           1 2019-09-30 2019-10-04
2:     A           2 2019-10-02 2019-10-04
3:     A           3 2019-10-03 2019-10-05
4:     A           4 2019-10-04 2019-10-06
5:     B           5 2019-09-30 2019-10-04
6:     B           6 2019-10-02 2019-10-04
7:     B           7 2019-10-03 2019-10-05
8:     B           8 2019-10-04 2019-10-06

What I want to do is simply count the number of employees that each store has on any given date, and bring this back to store_DT .我想要做的只是简单地计算每个商店在任何给定日期的员工人数,并将其带回store_DT The complication here is that roster_DT specifies a range of dates.这里的复杂之处在于roster_DT指定了一个日期范围。 Now, one solution is to simply expand roster_DT using the advice here .现在,一种解决方案是使用此处的建议简单地扩展roster_DT But the actual dataset is quite large, and expanding is not efficient/feasible.但实际数据集相当大,扩展效率不高/不可行。 So I was wondering if there were any other approaches.所以我想知道是否还有其他方法。

The finalized dataset I am looking for is:我正在寻找的最终数据集是:

    date         store   employees
 1: 2019-10-01     A      1
 2: 2019-10-02     A      2
 3: 2019-10-03     A      3
 4: 2019-10-04     A      4
 5: 2019-10-05     A      2
 6: 2019-10-01     B      1
 7: 2019-10-02     B      2
 8: 2019-10-03     B      3
 9: 2019-10-04     B      4
10: 2019-10-05     B      2

There are many many stores, and many many employees in my dataset, so I am hoping for a data.table solution.我的数据集中有很多商店,很多员工,所以我希望有一个 data.table 解决方案。

Thank you so much!非常感谢!

Please find below a solution (reprex) using the lubridate library and the foverlaps() function of the data.table library.使用请在下面找到一个解决方案(reprex) lubridate库和foverlaps()的函数data.table库。

Reprex正品

  • Code代码
library(data.table)
library(lubridate)

# Convert 'start_date' and 'end_date' columns into class 'date'
sel_cols <- c("start_date", "end_date")
roster_DT[, (sel_cols) := lapply(.SD, ymd), .SDcols = sel_cols]

# Create a dummy variable in the data.table 'store_DT'
store_DT[, dummy := date]

# Set keys for the data.table 'roster_DT'
setkey(roster_DT, start_date, end_date)

# Merge the two data.tables with 'foverlaps()' and summarize the resulting data.table to get the requested data.table (i.e. 'Results')
Results <- foverlaps(store_DT,roster_DT, by.x=c("date", "dummy"), type = "within")[, dummy := NULL][,.(employees = .N/2), by = .(date, store)][]

# Reorder the data.table 'Results' by 'store', then 'date'         
setorder(Results, store, date)

-Output -输出

Results
#>           date store employees
#>  1: 2019-10-01     A         1
#>  2: 2019-10-02     A         2
#>  3: 2019-10-03     A         3
#>  4: 2019-10-04     A         4
#>  5: 2019-10-05     A         2
#>  6: 2019-10-01     B         1
#>  7: 2019-10-02     B         2
#>  8: 2019-10-03     B         3
#>  9: 2019-10-04     B         4
#> 10: 2019-10-05     B         2

Created on 2021-11-17 by the reprex package (v2.0.1)reprex 包(v2.0.1) 于 2021 年 11 月 17 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM