簡體   English   中英

查找比當前值大/小 x 的值的第一個出現(行)(遍歷數據框中的每一行)

[英]Find the first incidence (row) of a value that is x amount greater/less than the current value (iterated through each row in a data frame)

我一直在盡力而為,但還沒有完全達到目標。 我正在嘗試遍歷向量 (df$sample) 中的值,並找到比當前值小 20% 的值的第一個發生率。 我試圖為每一行(示例)找到它,並將找到的值的日期打印到新列。

這是我的 df:

    date       sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...

我的嘗試是使用 Position() 或 which()。 我想也許我可以將它們中的任何一個包裝在一個 for 循環中,但我的嘗試不太正確。

for(i in length(df){

df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}

或者

for(i in length(df){

df$conc20 <- min(which(df$sample < df$sample[i]*0.8)

}

我什至找到了一個接近我正在尋找的東西的dply 示例

理想情況下:

    date       sample   conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...

我很樂意提供任何說明。 我真的很感激你的幫助!

編輯答案

df<- read.csv( sep = " ",  text=
                 "row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",                    
)
df$date=as.Date(as.character(df$date))
df   

#there is no row 20% below, so I am just using 2% below 
# and multiplying 0.98 instead of 0.8

# Finding cross-over before current row    
f_crossover_before<- function(  i  ){
  cutoff= 0.98* df$sample[i]
  res<- max(which( df$sample[1:i]<= cutoff), -1)
  ifelse ( (res>0) , res , NA )  # sapply cannot return dates !
}

# Finding cross-over after  current row   
f_crossover_after<- function(  i  ){
  cutoff<- 0.98* df$sample[i]
  res<- min( i+which( df$sample[(i+1):nrow(df)]<= cutoff), 
        .Machine$integer.max )
  ifelse ( (res<.Machine$integer.max) , res , NA )
}



# A column for  comparison. Only for visual inspection 
df$cutoff<- df$sample*0.98 


df$crossover_before<- sapply( seq_along(df$sample) ,  FUN = f_crossover_before )
df$crossover_before<- df$date[df$crossover_before]

df$crossover_after<- sapply( seq_along(df$sample) ,  FUN = f_crossover_after)
df$crossover_after<- df$date[df$crossover_after]




#View(df)

Output:

#   row       date   sample     cutoff crossover_before crossover_after
#1  591 2020-02-14 0.008470 0.00830060             <NA>      2020-02-16
#2  590 2020-02-15 0.008460 0.00829080             <NA>      2020-02-16
#3  589 2020-02-16 0.007681 0.00752738             <NA>      2020-02-17
#4  588 2020-02-17 0.007144 0.00700112             <NA>      2020-02-20
#5  587 2020-02-18 0.007262 0.00711676             <NA>      2020-02-20
#6  586 2020-02-19 0.007300 0.00715400       2020-02-17      2020-02-20
#7  585 2020-02-20 0.006604 0.00647192             <NA>      2020-02-26
#8  584 2020-02-21 0.006843 0.00670614       2020-02-20      2020-02-22
#9  583 2020-02-22 0.006687 0.00655326             <NA>      2020-02-26
#10 582 2020-02-23 0.006991 0.00685118       2020-02-22      2020-02-25
#11 581 2020-02-24 0.007333 0.00718634       2020-02-23      2020-02-25
#12 580 2020-02-25 0.006738 0.00660324             <NA>      2020-02-26
#13 579 2020-02-26 0.006279 0.00615342             <NA>            <NA>

相當混亂,但這應該可以解決問題

library(dplyr)
df<- read.csv( sep = " ",  text=
                 "row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279", 
               
)

x <- 1.05

df <- df %>%
  mutate(id =  1:n()) %>% 
  rowwise %>% 
  mutate(greater_row = 
           first(which(sample*x <
                         df$sample[id:nrow(df)]) + 
                   id-1))
df$greater_row <- df$date[df$greater_row]

這應該允許您將x設置為您想要的任何因素

如果我理解正確,這可以通過使用兩個輔助列的非等自連接來解決:

library(data.table)
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
  , conc20 := df[df, on = .(rn > rn, sample < threshold), mult = "first", x.date]][
    , c("rn", "threshold") := NULL][]
 date sample conc20 1: 2020-02-14 0.008470 2020-02-20 2: 2020-02-15 0.008460 2020-02-20 3: 2020-02-16 0.007681 2020-02-27 4: 2020-02-17 0.007144 2020-02-27 5: 2020-02-18 0.007262 2020-02-27 6: 2020-02-19 0.007300 2020-02-27 7: 2020-02-20 0.006604 <NA> 8: 2020-02-21 0.006843 2020-02-27 9: 2020-02-22 0.006687 2020-02-27 10: 2020-02-23 0.006991 2020-02-27 11: 2020-02-24 0.007333 2020-02-27 12: 2020-02-25 0.006738 2020-02-27 13: 2020-02-26 0.006279 <NA> 14: 2020-02-27 0.005300 <NA>

解釋

on =子句中的第一個條件確保只考慮后續行,第二個條件查找sample < threshold ,其中threshold預先定義為 80% of sample 輔助列rn包含行號(通過特殊符號.I創建)。 此外, mult = "first"表示在多個匹配項的情況下選擇第一個匹配項。

結果通過引用作為附加列conc20附加,即不復制整個數據集。 最后,通過引用刪除了兩個輔助列。

請注意,使用了鏈接。

為了演示,可以顯示包括所有輔助列的非相等自連接的結果:

setDT(df)[, rn := .I][, threshold := 0.8 * sample][
  df, on = .(rn > rn, sample < threshold), mult = "first"]
 date sample rn threshold i.date i.sample 1: 2020-02-20 0.0067760 1 0.0052832 2020-02-14 0.008470 2: 2020-02-20 0.0067680 2 0.0052832 2020-02-15 0.008460 3: 2020-02-27 0.0061448 3 0.0042400 2020-02-16 0.007681 4: 2020-02-27 0.0057152 4 0.0042400 2020-02-17 0.007144 5: 2020-02-27 0.0058096 5 0.0042400 2020-02-18 0.007262 6: 2020-02-27 0.0058400 6 0.0042400 2020-02-19 0.007300 7: <NA> 0.0052832 7 NA 2020-02-20 0.006604 8: 2020-02-27 0.0054744 8 0.0042400 2020-02-21 0.006843 9: 2020-02-27 0.0053496 9 0.0042400 2020-02-22 0.006687 10: 2020-02-27 0.0055928 10 0.0042400 2020-02-23 0.006991 11: 2020-02-27 0.0058664 11 0.0042400 2020-02-24 0.007333 12: 2020-02-27 0.0053904 12 0.0042400 2020-02-25 0.006738 13: <NA> 0.0050232 13 NA 2020-02-26 0.006279 14: <NA> 0.0042400 14 NA 2020-02-27 0.005300

數據

library(data.table)
df <- fread("
i   date       sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
580 2020-02-27 0.005300
", drop = 1L)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM