![](/img/trans.png)
[英]for each row in a data frame print column name that contains value less than or equal to x and calculate remaining value until next greatest value
[英]Find the first incidence (row) of a value that is x amount greater/less than the current value (iterated through each row in a data frame)
我一直在盡力而為,但還沒有完全達到目標。 我正在嘗試遍歷向量 (df$sample) 中的值,並找到比當前值小 20% 的值的第一個發生率。 我試圖為每一行(示例)找到它,並將找到的值的日期打印到新列。
這是我的 df:
date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
...
我的嘗試是使用 Position() 或 which()。 我想也許我可以將它們中的任何一個包裝在一個 for 循環中,但我的嘗試不太正確。
for(i in length(df){
df$conc20 <- Position(function(x) x < df$sample[i]*0.80, df$sample)
}
或者
for(i in length(df){
df$conc20 <- min(which(df$sample < df$sample[i]*0.8)
}
我什至找到了一個接近我正在尋找的東西的dply 示例。
理想情況下:
date sample conc20
591 2020-02-14 0.008470 2020-02-25
590 2020-02-15 0.008460 ...
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
...
我很樂意提供任何說明。 我真的很感激你的幫助!
編輯答案
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
df$date=as.Date(as.character(df$date))
df
#there is no row 20% below, so I am just using 2% below
# and multiplying 0.98 instead of 0.8
# Finding cross-over before current row
f_crossover_before<- function( i ){
cutoff= 0.98* df$sample[i]
res<- max(which( df$sample[1:i]<= cutoff), -1)
ifelse ( (res>0) , res , NA ) # sapply cannot return dates !
}
# Finding cross-over after current row
f_crossover_after<- function( i ){
cutoff<- 0.98* df$sample[i]
res<- min( i+which( df$sample[(i+1):nrow(df)]<= cutoff),
.Machine$integer.max )
ifelse ( (res<.Machine$integer.max) , res , NA )
}
# A column for comparison. Only for visual inspection
df$cutoff<- df$sample*0.98
df$crossover_before<- sapply( seq_along(df$sample) , FUN = f_crossover_before )
df$crossover_before<- df$date[df$crossover_before]
df$crossover_after<- sapply( seq_along(df$sample) , FUN = f_crossover_after)
df$crossover_after<- df$date[df$crossover_after]
#View(df)
Output:
# row date sample cutoff crossover_before crossover_after
#1 591 2020-02-14 0.008470 0.00830060 <NA> 2020-02-16
#2 590 2020-02-15 0.008460 0.00829080 <NA> 2020-02-16
#3 589 2020-02-16 0.007681 0.00752738 <NA> 2020-02-17
#4 588 2020-02-17 0.007144 0.00700112 <NA> 2020-02-20
#5 587 2020-02-18 0.007262 0.00711676 <NA> 2020-02-20
#6 586 2020-02-19 0.007300 0.00715400 2020-02-17 2020-02-20
#7 585 2020-02-20 0.006604 0.00647192 <NA> 2020-02-26
#8 584 2020-02-21 0.006843 0.00670614 2020-02-20 2020-02-22
#9 583 2020-02-22 0.006687 0.00655326 <NA> 2020-02-26
#10 582 2020-02-23 0.006991 0.00685118 2020-02-22 2020-02-25
#11 581 2020-02-24 0.007333 0.00718634 2020-02-23 2020-02-25
#12 580 2020-02-25 0.006738 0.00660324 <NA> 2020-02-26
#13 579 2020-02-26 0.006279 0.00615342 <NA> <NA>
相當混亂,但這應該可以解決問題
library(dplyr)
df<- read.csv( sep = " ", text=
"row date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279",
)
x <- 1.05
df <- df %>%
mutate(id = 1:n()) %>%
rowwise %>%
mutate(greater_row =
first(which(sample*x <
df$sample[id:nrow(df)]) +
id-1))
df$greater_row <- df$date[df$greater_row]
這應該允許您將x
設置為您想要的任何因素
如果我理解正確,這可以通過使用兩個輔助列的非等自連接來解決:
library(data.table)
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
, conc20 := df[df, on = .(rn > rn, sample < threshold), mult = "first", x.date]][
, c("rn", "threshold") := NULL][]
date sample conc20 1: 2020-02-14 0.008470 2020-02-20 2: 2020-02-15 0.008460 2020-02-20 3: 2020-02-16 0.007681 2020-02-27 4: 2020-02-17 0.007144 2020-02-27 5: 2020-02-18 0.007262 2020-02-27 6: 2020-02-19 0.007300 2020-02-27 7: 2020-02-20 0.006604 <NA> 8: 2020-02-21 0.006843 2020-02-27 9: 2020-02-22 0.006687 2020-02-27 10: 2020-02-23 0.006991 2020-02-27 11: 2020-02-24 0.007333 2020-02-27 12: 2020-02-25 0.006738 2020-02-27 13: 2020-02-26 0.006279 <NA> 14: 2020-02-27 0.005300 <NA>
on =
子句中的第一個條件確保只考慮后續行,第二個條件查找sample < threshold
,其中threshold
預先定義為 80% of sample
。 輔助列rn
包含行號(通過data.table特殊符號.I
創建)。 此外, mult = "first"
表示在多個匹配項的情況下選擇第一個匹配項。
結果通過引用作為附加列conc20
附加,即不復制整個數據集。 最后,通過引用刪除了兩個輔助列。
請注意,使用了data.table鏈接。
為了演示,可以顯示包括所有輔助列的非相等自連接的結果:
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
df, on = .(rn > rn, sample < threshold), mult = "first"]
date sample rn threshold i.date i.sample 1: 2020-02-20 0.0067760 1 0.0052832 2020-02-14 0.008470 2: 2020-02-20 0.0067680 2 0.0052832 2020-02-15 0.008460 3: 2020-02-27 0.0061448 3 0.0042400 2020-02-16 0.007681 4: 2020-02-27 0.0057152 4 0.0042400 2020-02-17 0.007144 5: 2020-02-27 0.0058096 5 0.0042400 2020-02-18 0.007262 6: 2020-02-27 0.0058400 6 0.0042400 2020-02-19 0.007300 7: <NA> 0.0052832 7 NA 2020-02-20 0.006604 8: 2020-02-27 0.0054744 8 0.0042400 2020-02-21 0.006843 9: 2020-02-27 0.0053496 9 0.0042400 2020-02-22 0.006687 10: 2020-02-27 0.0055928 10 0.0042400 2020-02-23 0.006991 11: 2020-02-27 0.0058664 11 0.0042400 2020-02-24 0.007333 12: 2020-02-27 0.0053904 12 0.0042400 2020-02-25 0.006738 13: <NA> 0.0050232 13 NA 2020-02-26 0.006279 14: <NA> 0.0042400 14 NA 2020-02-27 0.005300
library(data.table)
df <- fread("
i date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
580 2020-02-27 0.005300
", drop = 1L)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.