简体   繁体   English

对一个向量中小于另一个向量的所有值的差求和

[英]Summing the difference of all values of one vector that are less than the values in another

I have the following code below to try and loop through a sequence and select values below these values in a sequence and find the difference from another value.我有下面的代码来尝试循环遍历序列和序列中低于这些值的 select 值,并找到与另一个值的差异。 For large datasets, this can take a long time.对于大型数据集,这可能需要很长时间。 Is there a way to vectorize something like this without looping through the sequence to improve performance?有没有办法在不循环序列以提高性能的情况下对这样的东西进行矢量化?

a <- seq(1, 10, by=0.25)
b <- seq(1, 10, by=1)

c <- vector('list', length(b))

i <- 1
for (n in b){
    c[[i]] <- sum(n - a[n >= a])
    i <- i + 1
}

data.frame(c)

I've tried to use data.table to bin the data and find the difference, but cannot figure out how to calculate the difference from every value less than the bin value.我尝试使用 data.table 对数据进行分箱并找到差异,但无法弄清楚如何计算每个小于分箱值的值的差异。

library(data.table)

min.n <- 1
max.n <- 10 
a <- data.table(seq(min.n, max.n, by=0.5))
colnames(a) <- 'a'
b <- seq(min.n+1, max.n+1, by=1)

bins <- findInterval(a$a,b)
a[,bins:= bins+2]
a[, diff:= bins - a]

Here is an option using data.table using rolling join:这是使用data.table使用滚动连接的选项:

library(data.table)
A <- data.table(a, key="a")
B <- data.table(b, key="b")

A[, c("N", "cs") := .(.I, cumsum(a))]

A[B, on=.(a=b), roll=Inf, N * b - cs]

sum a[a <= n] can be replaced with cumsum (ie cs here) and rolling join will find those a that are less than b . sum a[a <= n]可以替换为 cumsum (即此处为cs ),滚动连接将找到那些小于b a Replace sum(n - cs) with a mathematical formula involving the summation symbol so that sum(constant) = number of elements in summation * constant.sum(n - cs)替换为包含求和符号的数学公式,使得sum(constant) = summation * constant 中的元素数。

output: output:

[1]   0.0   2.5   9.0  19.5  34.0  52.5  75.0 101.5 132.0 166.5

edit: some timings for reference编辑:一些时间供参考

timing code:计时码:

set.seed(0L)
library(data.table)
n <- 1e5L
a <- rnorm(n)
b <- rnorm(n/10L)
A <- data.table(a, key="a")
B <- data.table(b, key="b")

mtd0 <- function() A[B, on = .(a <= b), sum(i.b - x.a), by = .EACHI]$V1

mtd1 <- function() {
    A[, c("N", "cs") := .(.I, cumsum(a))]
    A[B, on=.(a=b), roll=Inf, N * b - cs]
}

all.equal(mtd0(), mtd1())
#[1] TRUE

microbenchmark::microbenchmark(times=1L, mtd0(), mtd1())

timings:时间:

Unit: milliseconds
   expr         min          lq        mean      median          uq         max neval
 mtd0() 2998.208000 2998.208000 2998.208000 2998.208000 2998.208000 2998.208000     1
 mtd1()    7.807637    7.807637    7.807637    7.807637    7.807637    7.807637     1

With data.table , this can be achieved by aggregating in a non-equi join :使用data.table ,这可以通过在非等值连接中聚合来实现:

library(data.table)
data.table(a)[data.table(b), on = .(a <= b), sum(i.b - x.a), by = .EACHI]$V1
 [1] 0.0 2.5 9.0 19.5 34.0 52.5 75.0 101.5 132.0 166.5

In a way, it is similar to MattB's approach but combines the cartesian product CJ() and subsetting in the non-equi join thereby avoiding to create data which will be filtered out subsequently.在某种程度上,它类似于MattB 的方法,但在非等值连接中结合了笛卡尔积CJ()和子集,从而避免创建随后将被过滤掉的数据。

Note that the x.请注意, x. prefix is required to pick the a column from the first data.table.从第一个 data.table 中选择a列需要前缀。


Alternatively, sum(ib - xa) can be re-written as .N * b - sum(xa) where the special symbol .N denotes the number of rows in a group.或者, sum(ib - xa)可以重写为.N * b - sum(xa)其中特殊符号.N表示组中的行数。

data.table(a)[data.table(b), on = .(a <= b), .N * b - sum(x.a), by = .EACHI]$V1
 [1] 0.0 2.5 9.0 19.5 34.0 52.5 75.0 101.5 132.0 166.5

A base R solution with findInterval , which is fast.带有findInterval的基本 R 解决方案,速度很快。

i <- findInterval(b, a)
sapply(seq_along(i), function(j)sum(b[j] - a[1:i[j]]))
# [1]   0.0   2.5   9.0  19.5  34.0  52.5  75.0 101.5 132.0 166.5

Something like this?像这样的东西?

library(data.table)
a <- seq(1, 10, by=0.25)
b <- seq(1, 10, by=1)

all.combinations <- CJ(a, b)  # Get all possible combinations
all.combinations[b>=a, sum(b-a), by=b]  # Filter for b>=a, then sum the difference for each value of b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM