[英]R data.table Setting the remainder of column values to next column value if exceeding a certain threshold for a large data set
我正在研究一個簡單的削峰算法,並尋找將列值的其余部分設置為下一列的最優化方法,如果該值超過了大時間序列的某個閾值。
考慮到我有以下示例數據集,為每個閾值設置了一定的閾值,目標是獲得一個 data.table,其中的值由它們的閾值限制,其余的被添加到下一列值(不超過它們的閾值)和等等到某個窗口限制。
loads <- data.table(index = 1:3,
time1 = c(6600,3000, 12000),
time2 = c(12000, 4000, 2000),
time3 = c(0, 0, 0),
time4 = c(3000,12000,0),
time5 = c(5000, 2000, 3000),
time6 = c(0, 0, 0),
time7 = c(15000, 0, 0))
thresholds <- c("time1" = 5000,
"time2" = 5000,
"time3" = 5000,
"time4" = 12000,
"time5" = 12000,
"time6" = 12000,
"time7" = 5000)
對於 7 列的窗口,這應該導致以下 data.table:
res <- data.table(index = 1:3,
time1 = c(5000, 3000, 5000),
time2 = c(5000, 4000, 5000),
time3 = c(5000, 0, 4000),
time4 = c(6600, 12000, 0),
time5 = c(5000, 2000, 3000),
time6 = c(0, 0, 0),
time7 = c(5000, 0, 0))
我知道有一些明顯的方法可以按行執行此操作,但我正在尋找一種更矢量化/data.table 的方法來執行此操作。
我不認為這很容易(甚至可能?)“只是”矢量化/ data.table
規范代碼,但這里有一個直接的for
循環,它像data.table
一樣data.table
(我認為)合理地(我認為) .
timeX
:我將timeX
添加到thresholds
( Inf
限制)和loads
(值0
)作為一個timeX
列,以便我們知道行的其余部分“丟失”了多少。 將它用於for
循環也很方便(盡管可以不用,通過一些代碼重寫)。
library(data.table)
thresholds <- c("time1" = 5000,
"time2" = 5000,
"time3" = 5000,
"time4" = 12000,
"time5" = 12000,
"time6" = 12000,
"time7" = 5000,
"timeX" = Inf)
loads[, timeX := 0 ]
for (ind in seq_along(thresholds)) {
if (ind >= length(thresholds)) break
nm <- names(thresholds)[ind]
nm1 <- names(thresholds)[ind+1]
rmndr <- pmax(0, loads[[nm]] - thresholds[ind])
set(loads, i = NULL, j = nm, value = pmin(loads[[nm]], thresholds[ind]))
set(loads, i = NULL, j = nm1, value = loads[[nm1]] + rmndr)
}
loads
# index time1 time2 time3 time4 time5 time6 time7 timeX
# <int> <num> <num> <num> <num> <num> <num> <num> <num>
# 1: 1 5000 5000 5000 6600 5000 0 5000 10000
# 2: 2 3000 4000 0 12000 2000 0 0 0
# 3: 3 5000 5000 4000 0 3000 0 0 0
或者如果你真的不在乎丟棄的數字,那么
## using unmodified `loads` and `thresholds`
for (ind in seq_along(thresholds)) {
nm <- names(thresholds)[ind]
rmndr <- pmax(0, loads[[nm]] - thresholds[nm])
set(loads, i = NULL, j = nm, value = pmin(loads[[nm]], thresholds[nm]))
if (ind == length(thresholds)) break
nm1 <- names(thresholds)[ind+1]
set(loads, i = NULL, j = nm1, value = loads[[nm1]] + rmndr)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.