簡體   English   中英

在data.table中向量化R for循環

[英]vectorize R for loop in data.table

我正在R中建立一個維護程序員。對於不同的機器,我有特定活動的例程,應該在特定日期執行,由頻率和開始日期定義。

我已經有一個data.table ,其頻率(以周為單位),大型維護的最后已知日期以及每個例程的預計日期,根據其頻率和上次日期。 簡化版本如下所示:

require(data.table)

dt <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9), machine = c("t1", 
"t1", "t1", "t1", "t1", "t2", "t2", "t2", "t2"), frequencyWeeks = c(4, 
12, 24, 48, 96, 4, 24, 48, 96), lastMaintenance = structure(c(17889, 
17889, 17889, 17889, 17889, 17871, 17871, 17871, 17871), class = "Date"), 
    datesRoutines = list(structure(c(17889, 17917, 17945, 17973, 
    18001, 18029, 18057, 18085, 18113, 18141, 18169, 18197, 18225, 
    18253, 18281, 18309, 18337, 18365, 18393, 18421, 18449, 18477, 
    18505, 18533, 18561, 18589, 18617), class = "Date"), structure(c(17889, 
    17973, 18057, 18141, 18225, 18309, 18393, 18477, 18561), class = "Date"), 
        structure(c(17889, 18057, 18225, 18393, 18561), class = "Date"), 
        structure(c(17889, 18225, 18561), class = "Date"), structure(c(17889, 
        18561), class = "Date"), structure(c(17871, 17899, 17927, 
        17955, 17983, 18011, 18039, 18067, 18095, 18123, 18151, 
        18179, 18207, 18235, 18263, 18291, 18319, 18347, 18375, 
        18403, 18431, 18459, 18487, 18515, 18543, 18571, 18599, 
        18627), class = "Date"), structure(c(17871, 18039, 18207, 
        18375, 18543), class = "Date"), structure(c(17871, 18207, 
        18543), class = "Date"), structure(c(17871, 18543), class = "Date"))), class = c("data.table", 
"data.frame"), row.names = c(NA, -9L))

DT

   id machine frequencyWeeks lastMaintenance                                                         datesRoutines
1:  1      t1              4      2018-12-24 2018-12-24,2019-01-21,2019-02-18,2019-03-18,2019-04-15,2019-05-13,...
2:  2      t1             12      2018-12-24 2018-12-24,2019-03-18,2019-06-10,2019-09-02,2019-11-25,2020-02-17,...
3:  3      t1             24      2018-12-24                2018-12-24,2019-06-10,2019-11-25,2020-05-11,2020-10-26
4:  4      t1             48      2018-12-24                                      2018-12-24,2019-11-25,2020-10-26
5:  5      t1             96      2018-12-24                                                 2018-12-24,2020-10-26
6:  6      t2              4      2018-12-06 2018-12-06,2019-01-03,2019-01-31,2019-02-28,2019-03-28,2019-04-25,...
7:  7      t2             24      2018-12-06                2018-12-06,2019-05-23,2019-11-07,2020-04-23,2020-10-08
8:  8      t2             48      2018-12-06                                      2018-12-06,2019-11-07,2020-10-08
9:  9      t2             96      2018-12-06                                                 2018-12-06,2020-10-08

需要 :我想為每台機器和干預日期建立具有最高id的例程(按照復雜性增加的順序記錄例程,這意味着它將是最復雜的例程)。

我到底發生了什么 :我使用了嵌套的for循環來實現它:

for (j in dt[, unique(machine)]){
    for (i in dt[machine == j, ][1, datesRoutines[[1]]]){
        result[count, "machine"] <- j
        result[count, "date"] <- as.Date(i, origin = origin)
        result[count, "rutina"] <- dt[machine == j, i %in% datesRoutines[[1]], by = id][V1 == TRUE, max(id)]
        count <- count + 1
    }
}

setDT(result)

預期結果 :我期望一個帶有機器,日期和例程id的data.table

head(result)
  machine       date rutina
1      t1 2018-12-24      5
2      t1 2019-01-21      1
3      t1 2019-02-18      1
4      t1 2019-03-18      2
5      t1 2019-04-15      1
6      t1 2019-05-13      1

問題 :是否可以對其進行矢量化? 這樣做的代碼是什么?

這是我能想到的最好的簡化:

   results <- list()
for(m in unique(dt$machine)){       
  dates <- dt[machine==m]$datesRoutines
  dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
  result <- data.table(date=dates)
  result[, machine:=m]
  for(d in dates){
    result[date==d, routine:=dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines), 
                              .(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))), 
                              by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]]

  }       
  results[[m]] <- result                         

} 
final_result <- rbindlist(results)

在這里,您可以更進一步:

results <- list()
for(m in unique(dt$machine)){       
  dates <- dt[machine==m]$datesRoutines
  dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
  result <- data.table(date=dates)
  result[, machine:=m]
  result$routine <-lapply(result$date, function(d){
    dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines), 
       .(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))), 
       by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
  results[[m]] <- result                         

} 
final_result <- rbindlist(results)

最后,對於for loop仇恨:

results <- lapply(unique(dt$machine), function(x){
  dates <- dt[machine==x]$datesRoutines
  dates <- as.Date(unique(unlist(dates)), origin="1970-01-01")
  result <- data.table(date=dates)
  result[, machine:=x]
})

tmp_result<-lapply(results, function(r){
  r$routine <-lapply(r$date, function(d){
    dt[as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines), 
       .(id, ord=as.double(max(which(as.Date(d, origin="1970-01-01") %in% unlist(datesRoutines))))), 
       by=seq_len(nrow(dt))][,.(ord==max(ord), id)][V1==T][, max(id)]})
})

final_results <- rbindlist(results)
final_results$rutina <- unlist(tmp_result)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM