简体   繁体   English

如何引用R data.table中的多个先前行

[英]How to refer to multiple previous rows in R data.table

I have a question regarding data.table in R i have a dataset like this 我有一个关于在data.table一个问题R我有一个这样的数据集

data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))

     a b
 1:  1 1
 2:  2 5
 3:  3 6
 4:  4 7
 5:  5 8
 6:  6 3
 7:  7 2
 8: 12 5
 9: 32 1
 10: 13 4

Now i want to generate a third column c, which gonna compare the value of each row of a, to all previous values of b and check if there is any value of b is bigger than a. 现在我想生成第三列c,它将a的每一行的值与b的所有先前值进行比较,并检查b的值是否大于a。 For eg, at row 5, a=5, and previous value of b is 1,5,6,7. 例如,在第5行,a = 5,并且b的先前值是1,5,6,7。 so 6 and 7 is bigger than 5, therefore value of c should be 1, otherwise it would be 0. The result should be like this 因此6和7大于5,因此c的值应该是1,否则它将是0.结果应该是这样的

     a b  c
 1:  1 1 NA
 2:  2 5  0
 3:  3 6  1
 4:  4 7  1
 5:  5 8  1
 6:  6 3  1
 7:  7 2  1
 8: 12 5  0
 9: 32 1  0
10: 13 4  0

I tried with a for loop but it takes a very long time. 我尝试使用for循环,但需要很长时间。 I also tried shift but i can not refer to multiple previous rows with shift. 我也尝试过shift但是我不能用shift来引用多个先前的行。 Anyone has any recommendation? 有人有什么建议吗?

library(data.table)
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
data[,c:= a <= shift(cummax(b))]

This is a base R solution (see the dplyr solution below): 这是一个基础R解决方案(参见下面的dplyr解决方案):

data$c = NA
data$c[2:nrow(data)] <- sapply(2:nrow(data), function(x) { data$c[x] <- any(data$a[x] < data$b[1:(x-1)]) } )

##      a b  c
##  1:  1 1 NA
##  2:  2 5  0
##  3:  3 6  1
##  4:  4 7  1
##  5:  5 8  1
##  6:  6 3  1
##  7:  7 2  1
##  8: 12 5  0
##  9: 32 1  0
## 10: 13 4  0

EDIT 编辑

Here is a simpler solution using dplyr 这是使用dplyr的更简单的解决方案

library(dplyr)
### Given the cumulative max and comparing to 'a', set see to 1/0.
data %>% mutate(c = ifelse(a < lag(cummax(b)), 1, 0))

##     a b  c
## 1   1 1 NA
## 2   2 5  0
## 3   3 6  1
## 4   4 7  1
## 5   5 8  1
## 6   6 3  1
## 7   7 2  1
## 8  12 5  0
## 9  32 1  0
## 10 13 4  0

### Using 'shift' with dplyr
data %>% mutate(c = ifelse(a <= shift(cummax(b)), 1, 0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM