[英]vectorize R for loop in data.table
我正在R中建立一個維護程序員。對於不同的機器,我有特定活動的例程,應該在特定日期執行,由頻率和開始日期定義。
我已經有一個data.table
,其頻率(以周為單位),大型維護的最后已知日期以及每個例程的預計日期,根據其頻率和上次日期。 簡化版本如下所示:
require(data.table)
dt <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), machine = c("t1",
"t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2"), frequencyWeeks = c(4,
12, 24, 48, 96, 4, 24, 48, 96), lastMaintenance = structure(c(17889,
17889, 17889, 17889, 17889, 17871, 17871, 17871, 17871), class = "Date"),
datesRoutines = list(structure(c(17889, 17917, 17945, 17973,
18001, 18029, 18057, 18085, 18113, 18141, 18169, 18197, 18225,
18253, 18281, 18309, 18337, 18365, 18393, 18421, 18449, 18477,
18505, 18533, 18561, 18589, 18617), class = "Date"), structure(c(17889,
17973, 18057, 18141, 18225, 18309, 18393, 18477, 18561), class = "Date"),
structure(c(17889, 18057, 18225, 18393, 18561), class = "Date"),
structure(c(17889, 18225, 18561), class = "Date"), structure(c(17889,
18561), class = "Date"), structure(c(17871, 17899, 17927,
17955, 17983, 18011, 18039, 18067, 18095, 18123, 18151,
18179, 18207, 18235, 18263, 18291, 18319, 18347, 18375,
18403, 18431, 18459, 18487, 18515, 18543, 18571, 18599,
18627), class = "Date"), structure(c(17871, 18039, 18207,
18375, 18543), class = "Date"), structure(c(17871, 18207,
18543), class = "Date"), structure(c(17871, 18543), class = "Date"))), class = c("data.table",
"data.frame"), row.names = c(NA, -9L))
DT
id machine frequencyWeeks lastMaintenance datesRoutines
1: 1 t1 4 2018-12-24 2018-12-24,2019-01-21,2019-02-18,2019-03-18,2019-04-15,2019-05-13,...
2: 2 t1 12 2018-12-24 2018-12-24,2019-03-18,2019-06-10,2019-09-02,2019-11-25,2020-02-17,...
3: 3 t1 24 2018-12-24 2018-12-24,2019-06-10,2019-11-25,2020-05-11,2020-10-26
4: 4 t1 48 2018-12-24 2018-12-24,2019-11-25,2020-10-26
5: 5 t1 96 2018-12-24 2018-12-24,2020-10-26
6: 6 t2 4 2018-12-06 2018-12-06,2019-01-03,2019-01-31,2019-02-28,2019-03-28,2019-04-25,...
7: 7 t2 24 2018-12-06 2018-12-06,2019-05-23,2019-11-07,2020-04-23,2020-10-08
8: 8 t2 48 2018-12-06 2018-12-06,2019-11-07,2020-10-08
9: 9 t2 96 2018-12-06 2018-12-06,2020-10-08
需要 :我想為每台機器和干預日期建立具有最高id的例程(按照復雜性增加的順序記錄例程,這意味着它將是最復雜的例程)。
我到底發生了什么 :我使用了嵌套的for循環來實現它:
for (j in dt[, unique(machine)]){
for (i in dt[machine == j, ][1, datesRoutines[[1]]]){
result[count, "machine"] <- j
result[count, "date"] <- as.Date(i, origin = origin)
result[count, "rutina"] <- dt[machine == j, i %in% datesRoutines[[1]], by = id][V1 == TRUE, max(id)]
count <- count + 1
}
}
setDT(result)
預期結果 :我期望一個帶有機器,日期和例程id的data.table
:
head(result)
machine date rutina
1 t1 2018-12-24 5
2 t1 2019-01-21 1
3 t1 2019-02-18 1
4 t1 2019-03-18 2
5 t1 2019-04-15 1
6 t1 2019-05-13 1
問題 :是否可以對其進行矢量化? 這樣做的代碼是什么?
這是我能想到的最好的簡化:
results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
for(d in dates){
result[date==d, routine:=dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]]
}
results[[m]] <- result
}
final_result <- rbindlist(results)
在這里,您可以更進一步:
results <- list()
for(m in unique(dt$machine)){
dates <- dt[machine==m]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=m]
result$routine <-lapply(result$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
results[[m]] <- result
}
final_result <- rbindlist(results)
最后,對於for loop
仇恨:
results <- lapply(unique(dt$machine), function(x){
dates <- dt[machine==x]$datesRoutines
dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
result <- data.table(date=dates)
result[, machine:=x]
})
tmp_result<-lapply(results, function(r){
r$routine <-lapply(r$date, function(d){
dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines),
.(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))),
by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
})
final_results <- rbindlist(results)
final_results$rutina <- unlist(tmp_result)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.