简体   繁体   English

如何按不同的列比较 R 数据帧中的两行并对它们执行操作?

[英]How can I compare two rows in R data frame by different columns and perform an operation on them?

I have some data with a similar structure.我有一些结构相似的数据。

test_data <- data.frame(start = c(1,15,17,35),
                         time = c(87, 1, 35, 3),
                         end = c(88,16,52,38))
test_data
  start   time end
1     1     88  87
2    15      1  16
3    17     35  52
4    35      3  38

I want to compare the "start" variable with the "end" variable of the previous row.我想将“开始”变量与一行的“结束”变量进行比较。 If the difference is less then 2 I want to sum them by time, leaving the start from the first row and end of the second row.如果差异小于 2,我想按时间对它们求和,从第一行开始到第二行结束。

So in this test data start in the third observation is 17 and end in the second is 16. The difference is 1 and thus I want to sum them.所以在这个测试数据中,第三个观察开始是 17,第二个观察结束是 16。差是 1,因此我想对它们求和。

I expect such an output我期待这样的 output

    start time end
1     1     88  87
2    15     36  52
3    35      3  38

Is there a neat way to do this in R?在 R 中是否有一种巧妙的方法可以做到这一点? I tried writing a for loop, but it seems that I am overcomplicating it a lot.我尝试编写一个 for 循环,但似乎我把它复杂化了很多。

You could use data.table with a combination of shift and cumsum :您可以结合使用data.tableshiftcumsum

library(data.table)
test_data <- data.frame(start = c(1,15,17,35),
                        time = c(87, 1, 35, 3),
                        end = c(88,16,52,38))

setDT(test_data)
test_data[, gid:=ifelse(.I>1 & shift(end) - start < 2, FALSE, TRUE)]
test_data[,.(start=head(start,1), time=sum(time), end=tail(end,1)), by=cumsum(gid)][,-"cumsum"]
#>    start time end
#> 1:     1   87  88
#> 2:    15   36  52
#> 3:    35    3  38

Created on 2021-01-31 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 1 月 31 日创建

Note that this would also merge multiple consecutive rows (not just 2) with <2 distance between previous end and current start values.请注意,这还将合并多个连续行(不仅仅是 2 行),前一个结束值和当前开始值之间的距离 <2。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在数据框(矩阵)中包含新的行和列,并在 R 的数据集中基于这些执行数学运算 - How to include new rows and columns in a data frame (matrix) and perform a mathematical operation based on these in the dataset in R R - 将列与两个不同数据帧的行进行比较 - R - compare columns with rows of two different data frames 如何使用数据框的两列在 R 中滚动应用? - How can I rollapply in R with two columns of a data frame? 如何从三个不同列中选择10个最大值并将它们保存在R中的新数据框中? - How can I select the 10 largest values from three different columns and save them in a new data frame in R? 如何根据R数据帧的两列中的位置之间的值执行数学运算? - How do I perform mathematical operations between values in two columns of an R data frame based on their position? R-比较不同行中两列中的值 - R - Compare values in two columns in different rows 如何将两组的列求和,然后折叠R数据框中的行 - How to sum columns by two groups, then collapse the rows in R data frame 比较两个具有不同列的data.frames以查找data.frame 1中其他行缺少的行 - Compare two data.frames with different columns to find the rows in data.frame 1 missing in other 如何根据匹配的字段值比较两个数据框列之间的值? - How can I compare values between two data frame columns based on matching field values? 如何比较R中不同data.frames中的两列 - How to compare two columns in different data.frames within R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM