[英]How to refer to multiple previous rows in R data.table
I have a question regarding data.table in R
i have a dataset like this 我有一个关于在data.table一个问题
R
我有一个这样的数据集
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
a b
1: 1 1
2: 2 5
3: 3 6
4: 4 7
5: 5 8
6: 6 3
7: 7 2
8: 12 5
9: 32 1
10: 13 4
Now i want to generate a third column c, which gonna compare the value of each row of a, to all previous values of b and check if there is any value of b is bigger than a. 现在我想生成第三列c,它将a的每一行的值与b的所有先前值进行比较,并检查b的值是否大于a。 For eg, at row 5, a=5, and previous value of b is 1,5,6,7.
例如,在第5行,a = 5,并且b的先前值是1,5,6,7。 so 6 and 7 is bigger than 5, therefore value of c should be 1, otherwise it would be 0. The result should be like this
因此6和7大于5,因此c的值应该是1,否则它将是0.结果应该是这样的
a b c
1: 1 1 NA
2: 2 5 0
3: 3 6 1
4: 4 7 1
5: 5 8 1
6: 6 3 1
7: 7 2 1
8: 12 5 0
9: 32 1 0
10: 13 4 0
I tried with a for loop but it takes a very long time. 我尝试使用for循环,但需要很长时间。 I also tried shift but i can not refer to multiple previous rows with shift.
我也尝试过shift但是我不能用shift来引用多个先前的行。 Anyone has any recommendation?
有人有什么建议吗?
library(data.table)
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
data[,c:= a <= shift(cummax(b))]
This is a base R solution (see the dplyr
solution below): 这是一个基础R解决方案(参见下面的
dplyr
解决方案):
data$c = NA
data$c[2:nrow(data)] <- sapply(2:nrow(data), function(x) { data$c[x] <- any(data$a[x] < data$b[1:(x-1)]) } )
## a b c
## 1: 1 1 NA
## 2: 2 5 0
## 3: 3 6 1
## 4: 4 7 1
## 5: 5 8 1
## 6: 6 3 1
## 7: 7 2 1
## 8: 12 5 0
## 9: 32 1 0
## 10: 13 4 0
EDIT 编辑
Here is a simpler solution using dplyr
这是使用
dplyr
的更简单的解决方案
library(dplyr)
### Given the cumulative max and comparing to 'a', set see to 1/0.
data %>% mutate(c = ifelse(a < lag(cummax(b)), 1, 0))
## a b c
## 1 1 1 NA
## 2 2 5 0
## 3 3 6 1
## 4 4 7 1
## 5 5 8 1
## 6 6 3 1
## 7 7 2 1
## 8 12 5 0
## 9 32 1 0
## 10 13 4 0
### Using 'shift' with dplyr
data %>% mutate(c = ifelse(a <= shift(cummax(b)), 1, 0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.