在 R 中生成测试数据

Question

I am trying to generate this table as one of the inputs to a test.我正在尝试生成此表作为测试的输入之一。

        id                 diff          d
 1:      1                    2 2020-07-31
 2:      1                    1 2020-08-01
 3:      1                    1 2020-08-02
 4:      1                    1 2020-08-03
 5:      1                    1 2020-08-04
 6:      2                    2 2020-07-31
 7:      2                    1 2020-08-01
 8:      2                    1 2020-08-02
 9:      2                    1 2020-08-03
10:      2                    1 2020-08-04
11:      3                    2 2020-07-31
12:      3                    1 2020-08-01
13:      3                    1 2020-08-02
14:      3                    1 2020-08-03
15:      3                    1 2020-08-04
16:      4                    2 2020-07-31
17:      4                    1 2020-08-01
18:      4                    1 2020-08-02
19:      4                    1 2020-08-03
20:      4                    1 2020-08-04
21:      5                    2 2020-07-31
22:      5                    1 2020-08-01
23:      5                    1 2020-08-02
24:      5                    1 2020-08-03
25:      5                    1 2020-08-04
        id                 diff          d

I have done it like this -我已经这样做了-

input1 = data.table(id=as.character(1:5), diff=1)
input1 = input1[,.(d=seq(as.Date('2020-07-31'), by='days', length.out = 5)),.(id, diff)]
input1[d == '2020-07-31']$diff = 2

diff is basically the number of days to the next weekday. diff基本上是到下一个工作日的天数。 Eg.例如。 31st Jul 2020 is Friday . 31st Jul 2020是Friday 。 Hence diff is 2 which is the diff to the next weekday, Monday .因此 diff 是 2 ，这是到下一个工作日Monday的差异。 For the others it will be 1.对于其他人，它将是 1。

Is there a more R idiomatic way of doing this?有没有更多的 R 惯用的方式来做到这一点？

I personally dont like that I had to generate the date sequence for each of the ids separately or the hardcoding of the diff that I have to do in the input for 31st July.我个人不喜欢我必须分别为每个 id 生成日期序列，或者我必须在 7 月 31 日的输入中对差异进行硬编码。 Is there a more generic way of doing this without the hardcoding?在没有硬编码的情况下，有没有更通用的方法来做到这一点？

Answer 1

We can create all combination of dates and id using crossing and create diff column based on whether the weekday is "Friday" .我们可以使用crossing创建日期和id的所有组合，并根据工作日是否为"Friday"创建diff列。

library(dplyr)

tidyr::crossing(id = 1:5, d = seq(as.Date('2020-07-31'), 
                          by='days', length.out = 5)) %>%
    mutate(diff = as.integer(weekdays(d) == 'Friday') + 1)

Similar logic using base R expand.grid :使用基础 R expand.grid类似逻辑：

transform(expand.grid(id = 1:5, 
                      d = seq(as.Date('2020-07-31'), by='days', length.out = 5)), 
          diff = as.integer(weekdays(d) == 'Friday') + 1)

and CJ in data.table :和CJ在data.table ：

library(data.table)
df <- CJ(id = 1:5, d = seq(as.Date('2020-07-31'), by='days', length.out = 5))
df[, diff := as.integer(weekdays(d) == 'Friday') + 1]

在 R 中生成测试数据

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-07-31 09:06:30

在 R 中生成测试数据

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-07-31 09:06:30

解决方案1
3 已采纳 2020-07-31 09:06:30