如何按不同的列比较 R 数据帧中的两行并对它们执行操作？

Question

I have some data with a similar structure.我有一些结构相似的数据。

test_data <- data.frame(start = c(1,15,17,35),
                         time = c(87, 1, 35, 3),
                         end = c(88,16,52,38))
test_data
  start   time end
1     1     88  87
2    15      1  16
3    17     35  52
4    35      3  38

I want to compare the "start" variable with the "end" variable of the previous row.我想将“开始”变量与上一行的“结束”变量进行比较。 If the difference is less then 2 I want to sum them by time, leaving the start from the first row and end of the second row.如果差异小于 2，我想按时间对它们求和，从第一行开始到第二行结束。

So in this test data start in the third observation is 17 and end in the second is 16. The difference is 1 and thus I want to sum them.所以在这个测试数据中，第三个观察开始是 17，第二个观察结束是 16。差是 1，因此我想对它们求和。

I expect such an output我期待这样的 output

    start time end
1     1     88  87
2    15     36  52
3    35      3  38

Is there a neat way to do this in R?在 R 中是否有一种巧妙的方法可以做到这一点？ I tried writing a for loop, but it seems that I am overcomplicating it a lot.我尝试编写一个 for 循环，但似乎我把它复杂化了很多。

Answer 1

You could use data.table with a combination of shift and cumsum :您可以结合使用data.table和shift和cumsum ：

library(data.table)
test_data <- data.frame(start = c(1,15,17,35),
                        time = c(87, 1, 35, 3),
                        end = c(88,16,52,38))

setDT(test_data)
test_data[, gid:=ifelse(.I>1 & shift(end) - start < 2, FALSE, TRUE)]
test_data[,.(start=head(start,1), time=sum(time), end=tail(end,1)), by=cumsum(gid)][,-"cumsum"]
#>    start time end
#> 1:     1   87  88
#> 2:    15   36  52
#> 3:    35    3  38

^{Created on 2021-01-31 by the reprex package (v1.0.0)}^{由代表 package (v1.0.0) 于 2021 年 1 月 31 日创建}

Note that this would also merge multiple consecutive rows (not just 2) with <2 distance between previous end and current start values.请注意，这还将合并多个连续行（不仅仅是 2 行），前一个结束值和当前开始值之间的距离 <2。

如何按不同的列比较 R 数据帧中的两行并对它们执行操作？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-01-31 16:41:35

如何按不同的列比较 R 数据帧中的两行并对它们执行操作？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-01-31 16:41:35

解决方案1
1 已采纳 2021-01-31 16:41:35