[英]Sumifs between two numbers across different dataframes
我有两个数据框。 一种具有“从”和“到”间隔的方法,如下所示;
Intervals <- data.frame("From" = c(0.0000,0.0069,0.0139,0.0208,0.0278,0.0347,0.0417,0.0486,0.0556,0.0625,0.0694,0.0764,0.0833),
"To" = c(0.0410,0.0479,0.0549,0.0618,0.0688,0.0757,0.0826,0.0896,0.0965,0.1035,0.1104,0.1174,0.1243))
第二个数据帧是:
x <- data.frame("Dummy" = c(0,1,0,0,0,0,0,0,1,0,0,0,0),
"Dummy Time" = c(0,0,0.006944444,0.006944444,0.010416667,0.010416667,0.013888889,0.013888889,0.020833333,0.024305556,0.027777778,0.03125,0.03125))
因此,如果虚拟时间落在间隔df的从和到之间(或等于)之间,我基本上想对虚拟变量的R做一个求和。 这在excel中很容易,但是对R而言却是很新的。
cbind无法正常工作,因为间隔和x是不同的行。 基本上,间隔只是标准日,我希望在间隔上创建一个新列,以显示该时间段内产生的虚拟对象总数
我可以想到的最透明的方法是:
n_interval = nrow(Intervals)
Intervals$DummySum = numeric(n_interval)
for(i in 1:n_interval) {
ind_i = x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
Intervals$DummySum[i] = sum(x$Dummy[ind_i])
}
这简单地循环遍历所有间隔,识别每个间隔内的虚拟对象,并对这些值求和。
如果您不喜欢for
循环,则可以使用sapply
:
Intervals$DummySum =
sapply(1:nrow(Intervals), function(i) sum(
x$Dummy[
x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
]
))
最后,您可以将其变成更通用的功能,如下所示:
sum_in_intervals = function(interval_start, interval_end, times, values, na.rm = FALSE) {
stopifnot(length(interval_start) == length(interval_end))
stopifnot(length(times) == length(values))
return(
sapply(1:length(interval_start), function(i) sum(
values[
times >= interval_start[i] & times < interval_end[i]
],
na.rm = na.rm
))
)
}
Intervals$DummySum = sum_in_intervals(Intervals$From, Intervals$To, x$DummyTime, x$Dummy)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.