繁体   English   中英

跨不同数据框的两个数字之间的和

[英]Sumifs between two numbers across different dataframes

我有两个数据框。 一种具有“从”和“到”间隔的方法,如下所示;

Intervals <- data.frame("From" = c(0.0000,0.0069,0.0139,0.0208,0.0278,0.0347,0.0417,0.0486,0.0556,0.0625,0.0694,0.0764,0.0833),
                        "To" = c(0.0410,0.0479,0.0549,0.0618,0.0688,0.0757,0.0826,0.0896,0.0965,0.1035,0.1104,0.1174,0.1243))

第二个数据帧是:

x <- data.frame("Dummy" = c(0,1,0,0,0,0,0,0,1,0,0,0,0), 
                "Dummy Time" = c(0,0,0.006944444,0.006944444,0.010416667,0.010416667,0.013888889,0.013888889,0.020833333,0.024305556,0.027777778,0.03125,0.03125))

因此,如果虚拟时间落在间隔df的从和到之间(或等于)之间,我基本上想对虚拟变量的R做一个求和。 这在excel中很容易,但是对R而言却是很新的。

cbind无法正常工作,因为间隔和x是不同的行。 基本上,间隔只是标准日,我希望在间隔上创建一个新列,以显示该时间段内产生的虚拟对象总数

我可以想到的最透明的方法是:

n_interval = nrow(Intervals)
Intervals$DummySum = numeric(n_interval)
for(i in 1:n_interval) {
  ind_i = x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
  Intervals$DummySum[i] = sum(x$Dummy[ind_i])
}

这简单地循环遍历所有间隔,识别每个间隔内的虚拟对象,并对这些值求和。

如果您不喜欢for循环,则可以使用sapply

Intervals$DummySum = 
  sapply(1:nrow(Intervals), function(i) sum(
    x$Dummy[
      x$DummyTime >= Intervals$From[i] & x$DummyTime < Intervals$To[i]
      ]
  ))

最后,您可以将其变成更通用的功能,如下所示:

sum_in_intervals = function(interval_start, interval_end, times, values, na.rm = FALSE) {
  stopifnot(length(interval_start) == length(interval_end))
  stopifnot(length(times) == length(values))

  return(
    sapply(1:length(interval_start), function(i) sum(
      values[
        times >= interval_start[i] & times < interval_end[i]
      ], 
      na.rm = na.rm
    ))
  )
}

Intervals$DummySum = sum_in_intervals(Intervals$From, Intervals$To, x$DummyTime, x$Dummy)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM