[英]vectorize R for loop in data.table
我正在R中建立一个维护程序员。对于不同的机器,我有特定活动的例程,应该在特定日期执行,由频率和开始日期定义。
我已经有一个data.table
,其频率(以周为单位),大型维护的最后已知日期以及每个例程的预计日期,根据其频率和上次日期。 简化版本如下所示:
require(data.table)
dt <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), machine = c("t1",
"t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2"), frequencyWeeks = c(4,
12, 24, 48, 96, 4, 24, 48, 96), lastMaintenance = structure(c(17889,
17889, 17889, 17889, 17889, 17871, 17871, 17871, 17871), class = "Date"),
datesRoutines = list(structure(c(17889, 17917, 17945, 17973,
18001, 18029, 18057, 18085, 18113, 18141, 18169, 18197, 18225,
18253, 18281, 18309, 18337, 18365, 18393, 18421, 18449, 18477,
18505, 18533, 18561, 18589, 18617), class = "Date"), structure(c(17889,
17973, 18057, 18141, 18225, 18309, 18393, 18477, 18561), class = "Date"),
structure(c(17889, 18057, 18225, 18393, 18561), class = "Date"),
structure(c(17889, 18225, 18561), class = "Date"), structure(c(17889,
18561), class = "Date"), structure(c(17871, 17899, 17927,
17955, 17983, 18011, 18039, 18067, 18095, 18123, 18151,
18179, 18207, 18235, 18263, 18291, 18319, 18347, 18375,
18403, 18431, 18459, 18487, 18515, 18543, 18571, 18599,
18627), class = "Date"), structure(c(17871, 18039, 18207,
18375, 18543), class = "Date"), structure(c(17871, 18207,
18543), class = "Date"), structure(c(17871, 18543), class = "Date"))), class = c("data.table",
"data.frame"), row.names = c(NA, -9L))
DT
id machine frequencyWeeks lastMaintenance datesRoutines
1: 1 t1 4 2018-12-24 2018-12-24,2019-01-21,2019-02-18,2019-03-18,2019-04-15,2019-05-13,...
2: 2 t1 12 2018-12-24 2018-12-24,2019-03-18,2019-06-10,2019-09-02,2019-11-25,2020-02-17,...
3: 3 t1 24 2018-12-24 2018-12-24,2019-06-10,2019-11-25,2020-05-11,2020-10-26
4: 4 t1 48 2018-12-24 2018-12-24,2019-11-25,2020-10-26
5: 5 t1 96 2018-12-24 2018-12-24,2020-10-26
6: 6 t2 4 2018-12-06 2018-12-06,2019-01-03,2019-01-31,2019-02-28,2019-03-28,2019-04-25,...
7: 7 t2 24 2018-12-06 2018-12-06,2019-05-23,2019-11-07,2020-04-23,2020-10-08
8: 8 t2 48 2018-12-06 2018-12-06,2019-11-07,2020-10-08
9: 9 t2 96 2018-12-06 2018-12-06,2020-10-08
需要 :我想为每台机器和干预日期建立具有最高id的例程(按照复杂性增加的顺序记录例程,这意味着它将是最复杂的例程)。
我到底发生了什么 :我使用了嵌套的for循环来实现它:
for (j in dt[, unique(machine)]){
for (i in dt[machine == j, ][1, datesRoutines[[1]]]){
result[count, "machine"] <- j
result[count, "date"] <- as.Date(i, origin = origin)
result[count, "rutina"] <- dt[machine == j, i %in% datesRoutines[[1]], by = id][V1 == TRUE, max(id)]
count <- count + 1
}
}
setDT(result)
预期结果 :我期望一个带有机器,日期和例程id的data.table
:
head(result)
machine date rutina
1 t1 2018-12-24 5
2 t1 2019-01-21 1
3 t1 2019-02-18 1
4 t1 2019-03-18 2
5 t1 2019-04-15 1
6 t1 2019-05-13 1
问题 :是否可以对其进行矢量化? 这样做的代码是什么?
这是我能想到的最好的简化:
results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
for(d in dates){
result[date==d, routine:=dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]]
}
results[[m]] <- result
}
final_result <- rbindlist(results)
在这里,您可以更进一步:
results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
result$routine <-lapply(result$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
results[[m]] <- result
}
final_result <- rbindlist(results)
最后,对于for loop
仇恨:
results <- lapply(unique(dt$machine), function(x){
dates <- dt[machine==x]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=x]
})
tmp_result<-lapply(results, function(r){
r$routine <-lapply(r$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
})
final_results <- rbindlist(results)
final_results$rutina <- unlist(tmp_result)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.