[英]make the loop more efficient and faster in r
我為數據制作了以下代碼它工作正常,但問題是由於我的數據集很大,它需要的時間太長。 有人可以讓代碼更高效、更快嗎? 提前非常感謝!
for (f in 1: nlevels(try$IDISIN)) {
temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
temp<-as.data.table(temp)
temp2<-temp %>%
arrange(TradingDate)
temp2<-as.data.table(temp2)
for (i in 1:nrow(temp2)) {
temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1] + temp2$Dailysum[i]))
if (temp2$CSum[i]<0) {
Selling<-bind_rows(Selling, temp2[i])
temp2$CSum[i]<-0
}
temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1,
temp2$CSum[i]/temp2$A.Factor[i],
temp2$CSum[i])
}
Rebind<-bind_rows(Rebind, temp2)
rm(list = "temp", "temp2")
}
這是簡化的數據集
try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]
ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1: abc A 2022-07-01 -4 0 0 abc-A
2: abc A 2022-07-02 8 0 0 abc-A
3: ghi A 2022-07-03 1 0.1 1 ghi-A
4: def B 2022-07-01 2 0 0 def-B
5: def B 2022-07-02 -6 0 0 def-B
6: def B 2022-07-03 9 0 0 def-B
7: ghi C 2022-07-01 4 0 0 ghi-C
8: ghi C 2022-07-02 8 0.5 1 ghi-C
9: ghi C 2022-07-03 9 0 0 ghi-C
我需要做以下事情
ISIN 和 ID 的累計總和
如果 cumulatvie sume 為負
將其保存到單獨的數據框中(上面代碼中的“銷售”)
將累積總和設置為 0
如果 Ind=1,將累積和除以因子
所以它應該看起來像這樣
dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))
ISIN ID Date Quantity Factor Ind CumulativeSum
1 abc A 2022-07-01 -4 0 0 0
2 abc A 2022-07-02 8 0 0 8
3 ghi A 2022-07-03 1 0.1 1 10
4 def B 2022-07-01 2 0 0 2
5 def B 2022-07-02 -6 0 0 0
6 def B 2022-07-03 9 0 0 9
7 ghi C 2022-07-01 4 0 0 4
8 ghi C 2022-07-02 8 0.5 1 24
9 ghi C 2022-07-03 9 0 0 33
我無法運行您的第一個代碼塊,因此我不確定Selling
應該是什么樣子,但我認為我們可以將其作為主要處理的一部分進行,然后您可以稍后將其過濾掉。
首先,我認為其中許多列應該是數字,所以
try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame': 9 obs. of 6 variables:
# $ ISIN : chr "abc" "abc" "ghi" "def" ...
# $ ID : chr "A" "A" "A" "B" ...
# $ TradingDate: chr "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
# $ Dailysum : int -4 8 1 2 -6 9 4 8 9
# $ A.Factor : num 0 0 0.1 0 0 0 0 0.5 0
# $ Ind : int 0 0 1 0 0 0 0 1 0
其次,我認為我們不需要IDISIN
,因為我認為您將它用作一個簡單的分組變量,在這種情況下data.table
對by=
的使用會為我們處理它。
第三,我假設您在外部控制數據的順序( TradingDate
),可能是setkey(try, ISIN, ID, TradingDate)
或類似的。 我不做任何檢查(或承諾,如果這不是真的。)(如果你想要try[, TradingDate := as.Date(TradingDate)]
就交給你了,這樣做似乎合乎邏輯,但這里沒有任何改變。)
從這里,
fun <- function(prev, this) {
z <- prev[1] + this[1]
c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
init = c(0, 0), accumulate = TRUE)[-1]),
by = .(ISIN, ID) ]
try
# ISIN ID TradingDate Dailysum A.Factor Ind CumulativeSum Sell
# <char> <char> <char> <int> <num> <int> <num> <num>
# 1: abc A 2022-07-01 -4 0.0 0 0 4
# 2: abc A 2022-07-02 8 0.0 0 8 0
# 3: ghi A 2022-07-03 1 0.1 1 10 0
# 4: def B 2022-07-01 2 0.0 0 2 0
# 5: def B 2022-07-02 -6 0.0 0 0 4
# 6: def B 2022-07-03 9 0.0 0 9 0
# 7: ghi C 2022-07-01 4 0.0 0 4 0
# 8: ghi C 2022-07-02 8 0.5 1 24 0
# 9: ghi C 2022-07-03 9 0.0 0 33 0
不需要for
循環。
誠然, Sell
邏輯可能需要仔細檢查,以確保連續的否定會根據您的需要做出反應。 這應該在fun
中處理。 在那里,兩個論點:
prev
是CumulativeSum
和Sell
的上一行的值,由上一次調用fun
(在組內)確定。 第一次為一個組調用它時,它被預先分配了值c(0, 0)
(通過init=
參數)。this
是當前行的c(Dailysum, Ind, A.Factor)
的三元組(所有數字,未命名),因此我們直接按位置對其進行索引。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.