如何对R中的列进行滚动求和？

Question

roll_sum and many other methods (eg https://vandomed.github.io/moving_averages.html ) are for summing over rows only. roll_sum 和许多其他方法（例如https://vandomed.github.io/moving_averages.html ）仅用于对行求和。 I have a large matrix that I don't have enough memory to transpose it.我有一个很大的矩阵，我没有足够的内存来转置它。 Is there a way I can do roll_sum over columns directly?有没有办法可以直接对列进行 roll_sum ？

For example:例如：

library(roll)

A=matrix(rnorm(10000),100)
roll_sum(A,3)

But I want to do this across columns.但我想跨列执行此操作。

To follow up, all of the methods so far are implemented without using multi-core processing.接下来，到目前为止所有的方法都是在不使用多核处理的情况下实现的。 Can anyone offer a solution with this feature?任何人都可以提供具有此功能的解决方案吗？

Answer 1

Here is an rcpp approach.这是一个rcpp方法。

Rcpp::cppFunction("
NumericMatrix rcpp_column_roll(const NumericMatrix mat, const int n) {

  const int ncol = mat.ncol();
  const int nrow = mat.nrow();
  NumericMatrix out(nrow, ncol);
  std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;

  
  for (int i = 0; i < nrow; i++) {
    NumericVector window(n);
    double roll = 0;
    int oldest_ind = 0;
    
    for (int j = 0; j < n ; j++) {
      double mat_ij = mat(i, j); 
      window(j) = mat_ij;
      roll += mat_ij;
    }
    
    out(i, n - 1) = roll;

    for (int j = n; j < ncol; j ++) {
      double mat_ij = mat(i, j); 
      
      roll += mat_ij;
      roll -= window(oldest_ind);
      
      out(i, j) = roll;
      
      window(oldest_ind) = mat_ij;
      
      if (oldest_ind == n-1) oldest_ind = 0; else oldest_ind++;
    }
  }
  return(out);
}
")

This is about 10x more memory efficient than transposing the result of apply(A, 1L, roll::roll_sum, 3L) and about 50x faster for the sample dataset.这比转置apply(A, 1L, roll::roll_sum, 3L)的结果大约高 10 倍的内存效率apply(A, 1L, roll::roll_sum, 3L)并且对于示例数据集快大约 50 倍。

bench::mark(rcpp_column_roll(A, 3),
            t(apply(A, 1, roll::roll_sum, 3)))

## # A tibble: 2 x 13
##   expression                             min   median `itr/sec` mem_alloc
##   <bch:expr>                        <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)             134.4us  139.7us     6641.    80.7KB
## 2 t(apply(A, 1, roll::roll_sum, 3))   7.62ms   8.91ms      101.     773KB

## With an 80 MB dataset (`rnorm(1E7)`):

##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            226ms  229ms      4.17    76.3MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 740ms  740ms      1.35   498.5MB

## 800 MB dataset (`rnorm(1E8)`):

## # A tibble: 2 x 13
##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            3.49s  3.49s     0.286  762.94MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 9.62s  9.62s     0.104    4.84GB

The memory savings seem to be stabilizing at about a 5-fold reduction and is more-or-less the allocation of the result matrix itself.内存节省似乎稳定在减少 5 倍左右，并且或多或少是结果矩阵本身的分配。

Alternatively, we can approach it more R-like and use an R loop to make a manual apply that does not need to be transposed.或者，我们可以更接近 R 并使用 R 循环来进行不需要转置的手动apply 。

out = matrix(NA_real_, nrow(A), ncol(A))
for (i in seq_len(nrow(A))) {
  out[i, ] = roll::roll_sum(A[i, ], 3L)
}

Is is moderately better than transposing the regular apply . Is 比转置常规apply略好。 @Moody_Mudskipper has the fastest approach although rcpp is the most memory efficient. @Moody_Mudskipper 拥有最快的方法，尽管rcpp是内存效率最高的。

##rnorm(1e8); ncols = 1000;
# A tibble: 6 x 13
  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>             <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int>
1 rcpp_column_roll(A, 3) 3.32s  3.32s     0.301  762.94MB    0         1
2 for_loop               6.12s  6.12s     0.163    2.98GB    0.327     1
3 dww_sappy                 7s     7s     0.143    4.86GB    0.572     1
4 matStat_Moody          1.81s  1.81s     0.552    2.24GB    0.552     1
5 roll_sum_Ronak         8.34s  8.34s     0.120    4.84GB    0.360     1
6 froll_Oliver           7.75s  7.75s     0.129    4.86GB    0.516     1

Note if you are really short on RAM, you can change the Rcpp function to modify the input directly, meaning you do not have to allocate another matrix.请注意，如果您的 RAM 确实不足，您可以更改 Rcpp 函数以直接修改输入，这意味着您不必分配另一个矩阵。 Otherwise you may be better off implementing Moody's clever solution in Rcpp as it would be faster and only need to allocate the out matrix.否则，您最好在 Rcpp 中实施穆迪巧妙的解决方案，因为它会更快，并且只需要分配输出矩阵。

Answer 2

Since a rolling sum can be seen as a subtraction of cumsums we can use the package {MatrixStats} which does these cumsums fast.由于滚动总和可以看作是累积和的减法，我们可以使用包{MatrixStats}来快速处理这些累积和。

A <- matrix(1:25,5)
A
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

What you can't do because of costly transpose :由于昂贵的转置而无法执行的操作：

library(roll)
t(roll_sum(t(A),3))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

with {MatrixStats}使用{MatrixStats}

library(matrixStats)
#> Warning: le package 'matrixStats' a été compilé avec la version R 4.0.3
row_roll_sum <- function(x, width) {
out <- rowCumsums(x)
out[,seq(width+1,ncol(out))] <- out[,seq(width+1,ncol(out))] -  out[,seq(ncol(out)-width)]
out[,seq(width-1)] <- NA
out
}
row_roll_sum(A, 3)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

Answer 3

Using base R matrix indexing we can do使用基本 R 矩阵索引我们可以做

n = 3
sapply(seq_len(NCOL(A)-n+1), function(j) rowSums(A[, j:(j+n-1)]))

No transpose required, and rowSums should be pretty optimized for speed.不需要转置，并且rowSums应该针对速度进行了非常优化。

Answer 4

Rolling Sums by Columns or Rows按列或行滚动总和

`Rcpp` function for roll sums by column or row按列或行滚动总和的`Rcpp`函数

Since it's pretty useful to be able to do this by row or column, I included the margin argument with the same usage seen in base::apply (ie 1=rows, 2=columns).由于能够按行或按列执行此操作非常有用，因此我包含了与base::apply相同用法的margin参数（即 1=rows, 2=columns）。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_rollsum(SEXP x, int n, int margin) {
  Rcpp::NumericMatrix y(x);
  int NR = y.nrow();
  int NC = y.ncol();
  NumericMatrix result(NR,NC);
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;

  if(margin==1){
    for(int i = 0; i < NR; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NC-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < NC; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NR-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

Benchmarks基准

mat_lg <- matrix(runif(1e6,1,1000),1e3,1e3)
res1 <- microbenchmark::microbenchmark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  times = 1000
)

knitr::kable(summary(res1))

expr表达式	min分钟	lq lq	mean意思	median中位数	uq uq	max最大限度	neval内瓦尔	cld CLD
matrix_rollsum matrix_rollsum	9.128677 9.128677	10.38814 10.38814	15.78466 15.78466	13.43251 13.43251	17.54006 17.54006	71.10719 71.10719	1000 1000	a一种
rcpp_colum_roll rcpp_colum_roll	23.195918 23.195918	26.54276 26.54276	33.65227 33.65227	30.43353 30.43353	38.11125 38.11125	113.20687 113.20687	1000 1000	b乙
apply_rollsum apply_rollsum	58.027111 58.027111	72.66437 72.66437	87.12061 87.12061	80.50741 80.50741	94.53146 94.53146	255.69353 255.69353	1000 1000	c C
for_loop for_loop	56.408078 56.408078	71.78122 71.78122	85.21565 85.21565	79.10471 79.10471	89.47916 89.47916	269.55304 269.55304	1000 1000	c C
row_roll_sum row_roll_sum	8.309067 8.309067	10.40819 10.40819	15.62686 15.62686	12.93160 12.93160	17.21942 17.21942	81.76514 81.76514	1000 1000	a一种

Benchmarks with memory allocation内存分配基准

res2 <- bench::mark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  iterations = 1000
)

summary(res2)[,1:9]

# A tibble: 5 x 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum    9.11ms   11.1ms      79.7   15.31MB    29.0 
2 rcpp_colum_roll   23.2ms   28.6ms      32.2    7.63MB     3.74
3 apply_rollsum    53.94ms   67.1ms      13.7   52.18MB   188.  
4 for_loop         55.18ms     69ms      13.2   33.13MB    17.8 
5 row_roll_sum      8.28ms   10.5ms      78.3   22.87MB    51.5

Benchmark Plots基准图

p1 <- ggplot2::autoplot(res1)
p2 <- ggplot2::autoplot(res2)

library(patchwork)
p1/p2

Edit编辑

Cole brought up a great point.科尔提出了一个很好的观点。 Why copy a large matrix?为什么要复制一个大矩阵？ Wouldn't working on the original object utilize less memory?处理原始对象不会占用更少的内存吗？ So I rewrote the Rcpp function to use the original object.所以我重写了Rcpp函数以使用原始对象。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix test(NumericMatrix x, int n, int margin) {

  Rcpp::NumericMatrix result(x.nrow(),x.ncol());
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
  double s=0.0;

  if(margin==1){
    for(int i = 0; i < x.nrow(); ++i){
      for(int j = 0; j < x.ncol()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < x.ncol(); ++i){
      for(int j = 0; j < x.nrow()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

Benchmarks基准

As Cole suspected, the new function allocated half the memory as the original, yet it was suprisingly 3x slower.正如 Cole 所怀疑的那样，新函数分配的内存是原始函数的一半，但令人惊讶的是它慢了 3 倍。

expr表达式	min分钟	lq lq	mean意思	median中位数	uq uq	max最大限度	neval内瓦尔	cld CLD
matrix_rollsum matrix_rollsum	9.317332 9.317332	10.84904 10.84904	15.47414 15.47414	13.75330 13.75330	16.36336 16.36336	101.6147 101.6147	1000 1000	a一种
test测试	34.498511 34.498511	40.08057 40.08057	47.49839 47.49839	43.26564 43.26564	48.34093 48.34093	211.3246 211.3246	1000 1000	b乙

# A tibble: 2 x 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum   9.15ms   10.1ms      93.7   15.31MB    33.4 
2 test             34.1ms   35.4ms      27.5    7.63MB     3.93

Answer 5

Perhaps, you can try using apply on matrix row-wise :也许，您可以尝试在矩阵 row-wise 上使用apply ：

apply(A, 1, zoo::rollsumr, 3, fill = NA)
#Or
#apply(A, 1, roll::roll_sum, 3)

However, note that this will give you output in column-order format.但是，请注意，这将为您提供列顺序格式的输出。 For example,例如，

A <- matrix(1:10, ncol = 5)
apply(A, 1, zoo::rollsumr, 3, fill = NA)

#     [,1] [,2]
#[1,]   NA   NA
#[2,]   NA   NA
#[3,]    9   12
#[4,]   15   18
#[5,]   21   24

Answer 6

Both of the provided answers are equally good here.提供的两个答案在这里都一样好。 There seems to be a bit of confusion in the question whether you are looking for a rolling sum over the columns or rows, or whether your output should be transposed by design.您是否正在寻找列或行的滚动总和，或者您的输出是否应该通过设计转置，这个问题似乎有点混乱。 If you are looking for the latter I'd suggest looking through Cole's answer and inverting the dimensions and indices of the output matrix.如果您正在寻找后者，我建议您查看 Cole 的答案并反转输出矩阵的维度和索引。

That say, if what you're looking for is column wise operations and output, you could simply use the froll* functions from the data.table package, which are designed for speed and memory efficiency.也就是说，如果您正在寻找的是列操作和输出，您可以简单地使用data.table包中的froll*函数，该函数专为速度和内存效率而设计。

mat <- matrix(rnorm(1e8), ncol = 10))
frollsum = frollsum(mat, 3)

I believe the roll library has somewhat similar performance however.但是，我相信roll库的性能有些相似。

Answer 7

Here is a base R option using embed for rolling sum这是使用embed滚动总和的基本 R 选项

out <- NA * A
out[, -(1:2)] <- t(sapply(1:nrow(A), function(k) rowSums(embed(A[k, ], 3))))

or或者

out <- NA * A
u <- embed(t(A), 3)
out[, -(1:2)] <- sapply(rev(split(1:ncol(u), ceiling(seq(ncol(u)) / nrow(A)))), function(k) colSums(u[, k]))

如何对R中的列进行滚动求和？

问题描述

7 个解决方案

解决方案1
5 2020-12-03 01:38:14

解决方案2
4 2020-12-04 16:17:51

解决方案3
3 2020-12-03 15:58:23

解决方案4
3 2020-12-09 12:47:02

Rolling Sums by Columns or Rows按列或行滚动总和

`Rcpp` function for roll sums by column or row按列或行滚动总和的`Rcpp`函数

Benchmarks基准

Benchmarks with memory allocation内存分配基准

Benchmark Plots基准图

Edit编辑

Benchmarks基准

解决方案5
1 2020-11-25 06:45:30

解决方案6
1 2020-12-03 08:50:50

解决方案7
0 2020-12-10 00:01:31

如何对R中的列进行滚动求和？

问题描述

7 个解决方案

解决方案1 5 2020-12-03 01:38:14

解决方案2 4 2020-12-04 16:17:51

解决方案3 3 2020-12-03 15:58:23

解决方案4 3 2020-12-09 12:47:02

Rolling Sums by Columns or Rows按列或行滚动总和

Rcpp function for roll sums by column or row按列或行滚动总和的Rcpp函数

Benchmarks基准

Benchmarks with memory allocation内存分配基准

Benchmark Plots基准图

Edit编辑

Benchmarks基准

解决方案5 1 2020-11-25 06:45:30

解决方案6 1 2020-12-03 08:50:50

解决方案7 0 2020-12-10 00:01:31

解决方案1
5 2020-12-03 01:38:14

解决方案2
4 2020-12-04 16:17:51

解决方案3
3 2020-12-03 15:58:23

解决方案4
3 2020-12-09 12:47:02

`Rcpp` function for roll sums by column or row按列或行滚动总和的`Rcpp`函数

解决方案5
1 2020-11-25 06:45:30

解决方案6
1 2020-12-03 08:50:50

解决方案7
0 2020-12-10 00:01:31