简体   繁体   English

在 R 中生成测试数据

[英]Generating test data in R

I am trying to generate this table as one of the inputs to a test.我正在尝试生成此表作为测试的输入之一。

        id                 diff          d
 1:      1                    2 2020-07-31
 2:      1                    1 2020-08-01
 3:      1                    1 2020-08-02
 4:      1                    1 2020-08-03
 5:      1                    1 2020-08-04
 6:      2                    2 2020-07-31
 7:      2                    1 2020-08-01
 8:      2                    1 2020-08-02
 9:      2                    1 2020-08-03
10:      2                    1 2020-08-04
11:      3                    2 2020-07-31
12:      3                    1 2020-08-01
13:      3                    1 2020-08-02
14:      3                    1 2020-08-03
15:      3                    1 2020-08-04
16:      4                    2 2020-07-31
17:      4                    1 2020-08-01
18:      4                    1 2020-08-02
19:      4                    1 2020-08-03
20:      4                    1 2020-08-04
21:      5                    2 2020-07-31
22:      5                    1 2020-08-01
23:      5                    1 2020-08-02
24:      5                    1 2020-08-03
25:      5                    1 2020-08-04
        id                 diff          d

I have done it like this -我已经这样做了-

input1 = data.table(id=as.character(1:5), diff=1)
input1 = input1[,.(d=seq(as.Date('2020-07-31'), by='days', length.out = 5)),.(id, diff)]
input1[d == '2020-07-31']$diff = 2

diff is basically the number of days to the next weekday. diff基本上是到下一个工作日的天数。 Eg.例如。 31st Jul 2020 is Friday . 31st Jul 2020Friday Hence diff is 2 which is the diff to the next weekday, Monday .因此 diff 是 2 ,这是到下一个工作日Monday的差异。 For the others it will be 1.对于其他人,它将是 1。

  • Is there a more R idiomatic way of doing this?有没有更多的 R 惯用的方式来做到这一点?

I personally dont like that I had to generate the date sequence for each of the ids separately or the hardcoding of the diff that I have to do in the input for 31st July.我个人不喜欢我必须分别为每个 id 生成日期序列,或者我必须在 7 月 31 日的输入中对差异进行硬编码。 Is there a more generic way of doing this without the hardcoding?在没有硬编码的情况下,有没有更通用的方法来做到这一点?

We can create all combination of dates and id using crossing and create diff column based on whether the weekday is "Friday" .我们可以使用crossing创建日期和id的所有组合,并根据工作日是否为"Friday"创建diff列。

library(dplyr)

tidyr::crossing(id = 1:5, d = seq(as.Date('2020-07-31'), 
                          by='days', length.out = 5)) %>%
    mutate(diff = as.integer(weekdays(d) == 'Friday') + 1)

Similar logic using base R expand.grid :使用基础 R expand.grid类似逻辑:

transform(expand.grid(id = 1:5, 
                      d = seq(as.Date('2020-07-31'), by='days', length.out = 5)), 
          diff = as.integer(weekdays(d) == 'Friday') + 1)

and CJ in data.table :CJdata.table

library(data.table)
df <- CJ(id = 1:5, d = seq(as.Date('2020-07-31'), by='days', length.out = 5))
df[, diff := as.integer(weekdays(d) == 'Friday') + 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM