[英]How can I compare two rows in R data frame by different columns and perform an operation on them?
I have some data with a similar structure.我有一些结构相似的数据。
test_data <- data.frame(start = c(1,15,17,35),
time = c(87, 1, 35, 3),
end = c(88,16,52,38))
test_data
start time end
1 1 88 87
2 15 1 16
3 17 35 52
4 35 3 38
I want to compare the "start" variable with the "end" variable of the previous row.我想将“开始”变量与上一行的“结束”变量进行比较。 If the difference is less then 2 I want to sum them by time, leaving the start from the first row and end of the second row.
如果差异小于 2,我想按时间对它们求和,从第一行开始到第二行结束。
So in this test data start in the third observation is 17 and end in the second is 16. The difference is 1 and thus I want to sum them.所以在这个测试数据中,第三个观察开始是 17,第二个观察结束是 16。差是 1,因此我想对它们求和。
I expect such an output我期待这样的 output
start time end
1 1 88 87
2 15 36 52
3 35 3 38
Is there a neat way to do this in R?在 R 中是否有一种巧妙的方法可以做到这一点? I tried writing a for loop, but it seems that I am overcomplicating it a lot.
我尝试编写一个 for 循环,但似乎我把它复杂化了很多。
You could use data.table
with a combination of shift
and cumsum
:您可以结合使用
data.table
和shift
和cumsum
:
library(data.table)
test_data <- data.frame(start = c(1,15,17,35),
time = c(87, 1, 35, 3),
end = c(88,16,52,38))
setDT(test_data)
test_data[, gid:=ifelse(.I>1 & shift(end) - start < 2, FALSE, TRUE)]
test_data[,.(start=head(start,1), time=sum(time), end=tail(end,1)), by=cumsum(gid)][,-"cumsum"]
#> start time end
#> 1: 1 87 88
#> 2: 15 36 52
#> 3: 35 3 38
Created on 2021-01-31 by the reprex package (v1.0.0)由代表 package (v1.0.0) 于 2021 年 1 月 31 日创建
Note that this would also merge multiple consecutive rows (not just 2) with <2 distance between previous end and current start values.请注意,这还将合并多个连续行(不仅仅是 2 行),前一个结束值和当前开始值之间的距离 <2。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.