如何將cummax（）應用到R中列中的系列中的特定值

Question

我在R，df中有一個data.table。 看起來像

> seq <- c(200,208, 212, 215, 218, 25,28, 232, 236, 245 , 247, 248, 249,265, 276, 284,298, 2, 12, 13, 17,
          152, 154, 159, 
          66, 69, 74, 81, 88, 91, 93, 94, 95, 96)
> cashreg <- rep(c('c1', 'c2', 'c3'), c(21, 3, 10))
> df <- data.table(seq, cashreg)
> df
    seq cashreg
 1: 200      c1
 2: 208      c1
 3: 212      c1
 4: 215      c1
 5: 218      c1
 6:  25      c1
 7:  28      c1
 8: 232      c1
 9: 236      c1
10: 245      c1
11: 247      c1
12: 248      c1
13: 249      c1
14: 265      c1
15: 276      c1
16: 284      c1
17: 298      c1
18:   2      c1
19:  12      c1
20:  13      c1
21:  17      c1
22: 152      c2
23: 154      c2
24: 159      c2
25:  66      c3
26:  69      c3
27:  74      c3
28:  81      c3
29:  88      c3
30:  91      c3
31:  93      c3
32:  94      c3
33:  95      c3
34:  96      c3

我有一個由用戶定義的系列的最大值：

actual_maximum <- 299

我想在min(maximum_in_series , actual_maximum)之前和之后獲得單調的序列。 這里maximum_in_series是每個“ cashreg”的最大值。

為了找到“ cashreg”的最大seq，我嘗試使用

> df[df[,.I[which.max(seq)], by = cashreg]$V1]
   seq cashreg
1: 298      c1
2: 159      c2
3:  96      c3

我想刪除這些最大值前后的亂序號。 我正在嘗試為每個cashreg使用cummax(seq) 。

For Example:在cashreg c1中，我要應用cummax（seq）直到min（max_series，actual_maximum）為298，並且要刪除序列號25和28。然后，它應該再次計算出剩余series（2，12，13，17）的seq，在這種情況下，最大值為17。因此我想對此部分應用cummax（seq）。

應該對每組Cashreg進行此過程。

預期的輸出看起來像。

    seq cashreg
 1: 200      c1
 2: 208      c1
 3: 212      c1
 4: 215      c1
 5: 218      c1
 6: 232      c1
 7: 236      c1
 8: 245      c1
 9: 247      c1
10: 248      c1
11: 249      c1
12: 265      c1
13: 276      c1
14: 284      c1
15: 298      c1
16:   2      c1
17:  12      c1
18:  13      c1
19:  17      c1
20: 152      c2
21: 154      c2
22: 159      c2
23:  66      c3
24:  69      c3
25:  74      c3
26:  81      c3
27:  88      c3
28:  91      c3
29:  93      c3
30:  94      c3
31:  95      c3
32:  96      c3

如何在R中使用data.table做到這一點

Answer 1

for( i in unique(df$cashreg)){
  #i <- "c1"
  cr <- i
  df_cashreg <- df[cashreg == cr,]
  df1 <- df_cashreg[1:df_cashreg[,.I[which.max(seq)]]]
  df2 <- df_cashreg[(df_cashreg[,.I[which.max(seq)]]+1) : nrow(df_cashreg)]
  df1 <- df1[, .SD[seq == cummax(seq)],cashreg]
  df2 <- df2[, .SD[seq == cummax(seq)],cashreg]
  df_combined <- rbind(df1, df2)
  if(file.exists(file.path(path2data,"temp_cashreg_clean.txt"))){
    write.table(df_combined, file.path(path2data, "temp_cashreg_clean.txt"), 
                row.names=FALSE, col.names=FALSE,append = TRUE, sep="\t", quote = FALSE)
  }else{
    write.table(df_combined, file.path(path2data, "temp_cashreg_clean.txt"), 
                row.names=FALSE, col.names=TRUE,sep="\t", quote = FALSE)
  }
}

df <- fread(file.path(path2data, "temp_cashreg_clean.txt"), colClasses = 
            c(cashreg = "character"))
df <- unique(df)
file.remove(file.path(path2data, "temp_cashreg_clean.txt"))

如何將cummax（）應用到R中列中的系列中的特定值

問題描述

1 個解決方案

解決方案1
0 2017-06-01 13:05:26

如何將cummax（）應用到R中列中的系列中的特定值

問題描述

1 個解決方案

解決方案1 0 2017-06-01 13:05:26

解決方案1
0 2017-06-01 13:05:26