使循環在 r 中更高效、更快

Question

我為數據制作了以下代碼它工作正常，但問題是由於我的數據集很大，它需要的時間太長。 有人可以讓代碼更高效、更快嗎？ 提前非常感謝！

for (f in 1: nlevels(try$IDISIN)) {
  temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
  temp<-as.data.table(temp)
  temp2<-temp %>%
    arrange(TradingDate) 
  temp2<-as.data.table(temp2)

  
  for (i in 1:nrow(temp2)) {
    temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1] + temp2$Dailysum[i]))
    if (temp2$CSum[i]<0) {
      Selling<-bind_rows(Selling, temp2[i])
      temp2$CSum[i]<-0
    }
    temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1, 
                          temp2$CSum[i]/temp2$A.Factor[i], 
                          temp2$CSum[i])
    
    
  }
  Rebind<-bind_rows(Rebind, temp2) 
  rm(list = "temp", "temp2")
}

這是簡化的數據集

try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]


  ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1:  abc  A  2022-07-01       -4        0   0  abc-A
2:  abc  A  2022-07-02        8        0   0  abc-A
3:  ghi  A  2022-07-03        1      0.1   1  ghi-A
4:  def  B  2022-07-01        2        0   0  def-B
5:  def  B  2022-07-02       -6        0   0  def-B
6:  def  B  2022-07-03        9        0   0  def-B
7:  ghi  C  2022-07-01        4        0   0  ghi-C
8:  ghi  C  2022-07-02        8      0.5   1  ghi-C
9:  ghi  C  2022-07-03        9        0   0  ghi-C

我需要做以下事情

ISIN 和 ID 的累計總和
如果 cumulatvie sume 為負
1. 將其保存到單獨的數據框中（上面代碼中的“銷售”）
2. 將累積總和設置為 0
如果 Ind=1，將累積和除以因子

所以它應該看起來像這樣

dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
                  CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))

  ISIN ID       Date Quantity Factor Ind CumulativeSum
1  abc  A 2022-07-01       -4      0   0             0
2  abc  A 2022-07-02        8      0   0             8
3  ghi  A 2022-07-03        1    0.1   1            10
4  def  B 2022-07-01        2      0   0             2
5  def  B 2022-07-02       -6      0   0             0
6  def  B 2022-07-03        9      0   0             9
7  ghi  C 2022-07-01        4      0   0             4
8  ghi  C 2022-07-02        8    0.5   1            24
9  ghi  C 2022-07-03        9      0   0            33

Answer 1

我無法運行您的第一個代碼塊，因此我不確定Selling應該是什么樣子，但我認為我們可以將其作為主要處理的一部分進行，然后您可以稍后將其過濾掉。

首先，我認為其中許多列應該是數字，所以

try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame':    9 obs. of  6 variables:
#  $ ISIN       : chr  "abc" "abc" "ghi" "def" ...
#  $ ID         : chr  "A" "A" "A" "B" ...
#  $ TradingDate: chr  "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
#  $ Dailysum   : int  -4 8 1 2 -6 9 4 8 9
#  $ A.Factor   : num  0 0 0.1 0 0 0 0 0.5 0
#  $ Ind        : int  0 0 1 0 0 0 0 1 0

其次，我認為我們不需要IDISIN ，因為我認為您將它用作一個簡單的分組變量，在這種情況下data.table對by=的使用會為我們處理它。

第三，我假設您在外部控制數據的順序（ TradingDate ），可能是setkey(try, ISIN, ID, TradingDate)或類似的。 我不做任何檢查（或承諾，如果這不是真的。）（如果你想要try[, TradingDate := as.Date(TradingDate)]就交給你了，這樣做似乎合乎邏輯，但這里沒有任何改變。）

從這里，

fun <- function(prev, this) {
  z <- prev[1] + this[1]
  c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
        transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
                         init = c(0, 0), accumulate = TRUE)[-1]),
    by = .(ISIN, ID) ]
try
#      ISIN     ID TradingDate Dailysum A.Factor   Ind CumulativeSum  Sell
#    <char> <char>      <char>    <int>    <num> <int>         <num> <num>
# 1:    abc      A  2022-07-01       -4      0.0     0             0     4
# 2:    abc      A  2022-07-02        8      0.0     0             8     0
# 3:    ghi      A  2022-07-03        1      0.1     1            10     0
# 4:    def      B  2022-07-01        2      0.0     0             2     0
# 5:    def      B  2022-07-02       -6      0.0     0             0     4
# 6:    def      B  2022-07-03        9      0.0     0             9     0
# 7:    ghi      C  2022-07-01        4      0.0     0             4     0
# 8:    ghi      C  2022-07-02        8      0.5     1            24     0
# 9:    ghi      C  2022-07-03        9      0.0     0            33     0

不需要for循環。

誠然， Sell邏輯可能需要仔細檢查，以確保連續的否定會根據您的需要做出反應。 這應該在fun中處理。 在那里，兩個論點：

prev是CumulativeSum和Sell的上一行的值，由上一次調用fun （在組內）確定。 第一次為一個組調用它時，它被預先分配了值c(0, 0) （通過init=參數）。
this是當前行的c(Dailysum, Ind, A.Factor)的三元組（所有數字，未命名），因此我們直接按位置對其進行索引。

使循環在 r 中更高效、更快

問題描述

1 個解決方案

解決方案1
1 已采納 2022-07-15 15:26:57

使循環在 r 中更高效、更快

問題描述

1 個解決方案

解決方案1 1 已采納 2022-07-15 15:26:57

解決方案1
1 已采納 2022-07-15 15:26:57