簡體   English   中英

使循環在 r 中更高效、更快

[英]make the loop more efficient and faster in r

我為數據制作了以下代碼它工作正常,但問題是由於我的數據集很大,它需要的時間太長。 有人可以讓代碼更高效、更快嗎? 提前非常感謝!

for (f in 1: nlevels(try$IDISIN)) {
  temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
  temp<-as.data.table(temp)
  temp2<-temp %>%
    arrange(TradingDate) 
  temp2<-as.data.table(temp2)

  
  for (i in 1:nrow(temp2)) {
    temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1] + temp2$Dailysum[i]))
    if (temp2$CSum[i]<0) {
      Selling<-bind_rows(Selling, temp2[i])
      temp2$CSum[i]<-0
    }
    temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1, 
                          temp2$CSum[i]/temp2$A.Factor[i], 
                          temp2$CSum[i])
    
    
  }
  Rebind<-bind_rows(Rebind, temp2) 
  rm(list = "temp", "temp2")
}

這是簡化的數據集

try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]


  ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1:  abc  A  2022-07-01       -4        0   0  abc-A
2:  abc  A  2022-07-02        8        0   0  abc-A
3:  ghi  A  2022-07-03        1      0.1   1  ghi-A
4:  def  B  2022-07-01        2        0   0  def-B
5:  def  B  2022-07-02       -6        0   0  def-B
6:  def  B  2022-07-03        9        0   0  def-B
7:  ghi  C  2022-07-01        4        0   0  ghi-C
8:  ghi  C  2022-07-02        8      0.5   1  ghi-C
9:  ghi  C  2022-07-03        9        0   0  ghi-C


我需要做以下事情

  1. ISIN 和 ID 的累計總和

  2. 如果 cumulatvie sume 為負

    1. 將其保存到單獨的數據框中(上面代碼中的“銷售”)

    2. 將累積總和設置為 0

  3. 如果 Ind=1,將累積和除以因子

所以它應該看起來像這樣

dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
                  CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))

  ISIN ID       Date Quantity Factor Ind CumulativeSum
1  abc  A 2022-07-01       -4      0   0             0
2  abc  A 2022-07-02        8      0   0             8
3  ghi  A 2022-07-03        1    0.1   1            10
4  def  B 2022-07-01        2      0   0             2
5  def  B 2022-07-02       -6      0   0             0
6  def  B 2022-07-03        9      0   0             9
7  ghi  C 2022-07-01        4      0   0             4
8  ghi  C 2022-07-02        8    0.5   1            24
9  ghi  C 2022-07-03        9      0   0            33

我無法運行您的第一個代碼塊,因此我不確定Selling應該是什么樣子,但我認為我們可以將其作為主要處理的一部分進行,然后您可以稍后將其過濾掉。

首先,我認為其中許多列應該是數字,所以

try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame':    9 obs. of  6 variables:
#  $ ISIN       : chr  "abc" "abc" "ghi" "def" ...
#  $ ID         : chr  "A" "A" "A" "B" ...
#  $ TradingDate: chr  "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
#  $ Dailysum   : int  -4 8 1 2 -6 9 4 8 9
#  $ A.Factor   : num  0 0 0.1 0 0 0 0 0.5 0
#  $ Ind        : int  0 0 1 0 0 0 0 1 0

其次,我認為我們不需要IDISIN ,因為我認為您將它用作一個簡單的分組變量,在這種情況下data.tableby=的使用會為我們處理它。

第三,我假設您在外部控制數據的順序( TradingDate ),可能是setkey(try, ISIN, ID, TradingDate)或類似的。 我不做任何檢查(或承諾,如果這不是真的。)(如果你想要try[, TradingDate := as.Date(TradingDate)]就交給你了,這樣做似乎合乎邏輯,但這里沒有任何改變。)

從這里,

fun <- function(prev, this) {
  z <- prev[1] + this[1]
  c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
        transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
                         init = c(0, 0), accumulate = TRUE)[-1]),
    by = .(ISIN, ID) ]
try
#      ISIN     ID TradingDate Dailysum A.Factor   Ind CumulativeSum  Sell
#    <char> <char>      <char>    <int>    <num> <int>         <num> <num>
# 1:    abc      A  2022-07-01       -4      0.0     0             0     4
# 2:    abc      A  2022-07-02        8      0.0     0             8     0
# 3:    ghi      A  2022-07-03        1      0.1     1            10     0
# 4:    def      B  2022-07-01        2      0.0     0             2     0
# 5:    def      B  2022-07-02       -6      0.0     0             0     4
# 6:    def      B  2022-07-03        9      0.0     0             9     0
# 7:    ghi      C  2022-07-01        4      0.0     0             4     0
# 8:    ghi      C  2022-07-02        8      0.5     1            24     0
# 9:    ghi      C  2022-07-03        9      0.0     0            33     0

不需要for循環。

誠然, Sell邏輯可能需要仔細檢查,以確保連續的否定會根據您的需要做出反應。 這應該在fun中處理。 在那里,兩個論點:

  • prevCumulativeSumSell一行的值,由上一次調用fun (在組內)確定。 第一次為一個組調用它時,它被預先分配了值c(0, 0) (通過init=參數)。
  • this當前行的c(Dailysum, Ind, A.Factor)的三元組(所有數字,未命名),因此我們直接按位置對其進行索引。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM