繁体   English   中英

如何使用 set.seed() 和 sample() 减少用于创建可重现数据帧的代码?

[英]How can I reduce the code used to create a reproducible dataframe using set.seed() and sample()?

我想创建一个名为Activity的相当大且可重复的数据集,以在 StackOverFlow 上提出一个问题。 我的数据框将由变量组成:

  1. DateTime :以毫秒为单位的日期和时间,数据速率为每秒 11 个值,即每秒 11 行。
  2. ID : 指个人。 我想创建一个包含 3 个人( ABC )数据的数据集。
  3. x :范围从 -1 到 +1 的随机数据。
  4. y :范围从 -1 到 +1 的随机数据。
  5. z : 从 -1 到 +1 的随机数据。

我最初使用此代码:

set.seed(100)
fmt <- "%Y-%m-%d %H:%M:%OS"

DateTime = seq(from=as.POSIXct("2017-08-05 14:03:55.300", format=fmt, tz="UTC"), by=1/11, length.out=67)
ID = rep("A", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity1<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 16:18:12.100", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("B", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity2<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 20:34:31.540", format=fmt, tz="UTC"),by=1/11, length.out=67)
ID = rep("C", each=67)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)
Activity3<- data.frame(DateTime,ID, x, y, z)
Activity<- rbind(Activity1,Activity2,Activity3)

head(Activity)
                   DateTime ID     x     y     z
1 2017-08-05 14:03:55.29999  A  0.01  0.82 -0.56
2 2017-08-05 14:03:55.39090  A  0.11  0.74  0.07
3 2017-08-05 14:03:55.48182  A  0.50  0.95 -0.64
4 2017-08-05 14:03:55.57273  A  0.97 -0.89  0.95
5 2017-08-05 14:03:55.66364  A -0.97  0.78 -0.01
6 2017-08-05 14:03:55.75454  A -0.46  0.20  1.00

如何使用更少的代码创建相同的数据帧? 我需要在 StackOverFlow 的另一篇文章中创建一个可重现的数据帧,其他用户告诉我应该使用更少的代码来创建我的示例。

有许多不同的方法可以达到相同的结果。 这就是我会使用我喜欢的工具做的事情:

library(data.table)
# define parameters to control the process
base_data <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
  , DateTime := lubridate::ymd_hms(DateTime)]
# expand sequences rowwise
Activity <- base_data[, .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)), 
                      by = .(rn = seq(nrow(base_data)), ID)][
                        , rn := NULL][]
# create x, y, z columns by sampling
cols <- c("x", "y", "z")
set.seed(100)
Activity[,  (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)]

Activity
 ID DateTime xyz 1: A 2017-08-05 14:03:55 -0.38 0.91 -0.28 2: A 2017-08-05 14:03:55 -0.48 0.83 -0.12 3: A 2017-08-05 14:03:55 0.10 0.65 0.61 4: A 2017-08-05 14:03:55 -0.89 -0.36 0.04 5: A 2017-08-05 14:03:55 -0.06 0.76 0.39 --- 197: C 2017-08-05 20:34:37 -0.76 -0.52 -0.81 198: C 2017-08-05 20:34:37 0.20 0.44 -0.59 199: C 2017-08-05 20:34:37 -0.76 -0.41 -0.94 200: C 2017-08-05 20:34:37 0.58 0.02 0.16 201: C 2017-08-05 20:34:37 -0.26 -0.44 -0.69

默认情况下不打印秒的分数,但可以通过以下方式验证 1/11 秒的增量

head(diff(Activity$DateTime))
 Time differences in secs [1] 0.09090900 0.09090924 0.09090900 0.09090900 0.09090924 0.09090900

由于 OP没有要求完全使用给定的种子值重现他的结果,我已替换

sample(seq(from = -1, to = 1, by = 0.01), size = 67, replace = TRUE)

经过

round(runif(.N, -1, +1), 2)

如果sample()seq()则可以跳过seq()部分

sample((-100:100)/100, .N, replace = TRUE)

使用data.table链接代码可以更简洁地编写为

library(data.table)
cols <- c("x", "y", "z")
set.seed(100)
Activity <- fread("DateTime, ID, N
2017-08-05 14:03:55.300, A, 67
2017-08-05 16:18:12.100, B, 67
2017-08-05 20:34:31.540, C, 67")[
  , DateTime := lubridate::ymd_hms(DateTime)][
    , .(DateTime = seq(from = DateTime, by = 1/11, length.out = N)), 
    by = .(rn = seq(nrow(base_data)), ID)][
      ,  (cols) := replicate(length(cols), round(runif(.N, -1, +1), 2), simplify = FALSE)][
        , rn := NULL][]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM