如何對R中的列進行滾動求和？

Question

roll_sum 和許多其他方法（例如https://vandomed.github.io/moving_averages.html ）僅用於對行求和。 我有一個很大的矩陣，我沒有足夠的內存來轉置它。 有沒有辦法可以直接對列進行 roll_sum ？

例如：

library(roll)

A=matrix(rnorm(10000),100)
roll_sum(A,3)

但我想跨列執行此操作。

接下來，到目前為止所有的方法都是在不使用多核處理的情況下實現的。 任何人都可以提供具有此功能的解決方案嗎？

Answer 1

這是一個rcpp方法。

Rcpp::cppFunction("
NumericMatrix rcpp_column_roll(const NumericMatrix mat, const int n) {

  const int ncol = mat.ncol();
  const int nrow = mat.nrow();
  NumericMatrix out(nrow, ncol);
  std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;

  
  for (int i = 0; i < nrow; i++) {
    NumericVector window(n);
    double roll = 0;
    int oldest_ind = 0;
    
    for (int j = 0; j < n ; j++) {
      double mat_ij = mat(i, j); 
      window(j) = mat_ij;
      roll += mat_ij;
    }
    
    out(i, n - 1) = roll;

    for (int j = n; j < ncol; j ++) {
      double mat_ij = mat(i, j); 
      
      roll += mat_ij;
      roll -= window(oldest_ind);
      
      out(i, j) = roll;
      
      window(oldest_ind) = mat_ij;
      
      if (oldest_ind == n-1) oldest_ind = 0; else oldest_ind++;
    }
  }
  return(out);
}
")

這比轉置apply(A, 1L, roll::roll_sum, 3L)的結果大約高 10 倍的內存效率apply(A, 1L, roll::roll_sum, 3L)並且對於示例數據集快大約 50 倍。

bench::mark(rcpp_column_roll(A, 3),
            t(apply(A, 1, roll::roll_sum, 3)))

## # A tibble: 2 x 13
##   expression                             min   median `itr/sec` mem_alloc
##   <bch:expr>                        <bch:tm> <bch:tm>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)             134.4us  139.7us     6641.    80.7KB
## 2 t(apply(A, 1, roll::roll_sum, 3))   7.62ms   8.91ms      101.     773KB

## With an 80 MB dataset (`rnorm(1E7)`):

##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            226ms  229ms      4.17    76.3MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 740ms  740ms      1.35   498.5MB

## 800 MB dataset (`rnorm(1E8)`):

## # A tibble: 2 x 13
##   expression                          min median `itr/sec` mem_alloc
##   <bch:expr>                        <bch> <bch:>     <dbl> <bch:byt>
## 1 rcpp_column_roll(A, 3)            3.49s  3.49s     0.286  762.94MB
## 2 t(apply(A, 1, roll::roll_sum, 3)) 9.62s  9.62s     0.104    4.84GB

內存節省似乎穩定在減少 5 倍左右，並且或多或少是結果矩陣本身的分配。

或者，我們可以更接近 R 並使用 R 循環來進行不需要轉置的手動apply 。

out = matrix(NA_real_, nrow(A), ncol(A))
for (i in seq_len(nrow(A))) {
  out[i, ] = roll::roll_sum(A[i, ], 3L)
}

Is 比轉置常規apply略好。 @Moody_Mudskipper 擁有最快的方法，盡管rcpp是內存效率最高的。

##rnorm(1e8); ncols = 1000;
# A tibble: 6 x 13
  expression               min median `itr/sec` mem_alloc `gc/sec` n_itr
  <bch:expr>             <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int>
1 rcpp_column_roll(A, 3) 3.32s  3.32s     0.301  762.94MB    0         1
2 for_loop               6.12s  6.12s     0.163    2.98GB    0.327     1
3 dww_sappy                 7s     7s     0.143    4.86GB    0.572     1
4 matStat_Moody          1.81s  1.81s     0.552    2.24GB    0.552     1
5 roll_sum_Ronak         8.34s  8.34s     0.120    4.84GB    0.360     1
6 froll_Oliver           7.75s  7.75s     0.129    4.86GB    0.516     1

請注意，如果您的 RAM 確實不足，您可以更改 Rcpp 函數以直接修改輸入，這意味着您不必分配另一個矩陣。 否則，您最好在 Rcpp 中實施穆迪巧妙的解決方案，因為它會更快，並且只需要分配輸出矩陣。

Answer 2

由於滾動總和可以看作是累積和的減法，我們可以使用包{MatrixStats}來快速處理這些累積和。

A <- matrix(1:25,5)
A
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    6   11   16   21
#> [2,]    2    7   12   17   22
#> [3,]    3    8   13   18   23
#> [4,]    4    9   14   19   24
#> [5,]    5   10   15   20   25

由於昂貴的轉置而無法執行的操作：

library(roll)
t(roll_sum(t(A),3))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

使用{MatrixStats}

library(matrixStats)
#> Warning: le package 'matrixStats' a été compilé avec la version R 4.0.3
row_roll_sum <- function(x, width) {
out <- rowCumsums(x)
out[,seq(width+1,ncol(out))] <- out[,seq(width+1,ncol(out))] -  out[,seq(ncol(out)-width)]
out[,seq(width-1)] <- NA
out
}
row_roll_sum(A, 3)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   NA   NA   18   33   48
#> [2,]   NA   NA   21   36   51
#> [3,]   NA   NA   24   39   54
#> [4,]   NA   NA   27   42   57
#> [5,]   NA   NA   30   45   60

Answer 3

使用基本 R 矩陣索引我們可以做

n = 3
sapply(seq_len(NCOL(A)-n+1), function(j) rowSums(A[, j:(j+n-1)]))

不需要轉置，並且rowSums應該針對速度進行了非常優化。

Answer 4

按列或行滾動總和

按列或行滾動總和的`Rcpp`函數

由於能夠按行或按列執行此操作非常有用，因此我包含了與base::apply相同用法的margin參數（即 1=rows, 2=columns）。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix matrix_rollsum(SEXP x, int n, int margin) {
  Rcpp::NumericMatrix y(x);
  int NR = y.nrow();
  int NC = y.ncol();
  NumericMatrix result(NR,NC);
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;

  if(margin==1){
    for(int i = 0; i < NR; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NC-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < NC; ++i){
      NumericVector tmpvec = y(i,_);
      for(int j = 0; j < NR-n+1;++j){
        double s=0.0;
        for(int q=j; q<j+n;q++){
          s+=tmpvec[q];
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

基准

mat_lg <- matrix(runif(1e6,1,1000),1e3,1e3)
res1 <- microbenchmark::microbenchmark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  times = 1000
)

knitr::kable(summary(res1))

表達式	分鍾	lq	意思	中位數	uq	最大限度	內瓦爾	CLD
matrix_rollsum	9.128677	10.38814	15.78466	13.43251	17.54006	71.10719	1000	一種
rcpp_colum_roll	23.195918	26.54276	33.65227	30.43353	38.11125	113.20687	1000	乙
apply_rollsum	58.027111	72.66437	87.12061	80.50741	94.53146	255.69353	1000	C
for_loop	56.408078	71.78122	85.21565	79.10471	89.47916	269.55304	1000	C
row_roll_sum	8.309067	10.40819	15.62686	12.93160	17.21942	81.76514	1000	一種

內存分配基准

res2 <- bench::mark(
  matrix_rollsum = matrix_rollsum(mat_lg, 3,1),
  rcpp_colum_roll = rcpp_column_roll(mat_lg,3), 
  apply_rollsum = apply_rollsum(mat_lg,3),
  for_loop = for_loop(mat_lg,3),
  row_roll_sum = row_roll_sum(mat_lg,width = 3),
  iterations = 1000
)

summary(res2)[,1:9]

# A tibble: 5 x 6
  expression           min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum    9.11ms   11.1ms      79.7   15.31MB    29.0 
2 rcpp_colum_roll   23.2ms   28.6ms      32.2    7.63MB     3.74
3 apply_rollsum    53.94ms   67.1ms      13.7   52.18MB   188.  
4 for_loop         55.18ms     69ms      13.2   33.13MB    17.8 
5 row_roll_sum      8.28ms   10.5ms      78.3   22.87MB    51.5

基准圖

p1 <- ggplot2::autoplot(res1)
p2 <- ggplot2::autoplot(res2)

library(patchwork)
p1/p2

編輯

科爾提出了一個很好的觀點。 為什么要復制一個大矩陣？ 處理原始對象不會占用更少的內存嗎？ 所以我重寫了Rcpp函數以使用原始對象。

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
Rcpp::NumericMatrix test(NumericMatrix x, int n, int margin) {

  Rcpp::NumericMatrix result(x.nrow(),x.ncol());
  std::fill( result.begin(), result.end(), NumericVector::get_na() ) ;
  double s=0.0;

  if(margin==1){
    for(int i = 0; i < x.nrow(); ++i){
      for(int j = 0; j < x.ncol()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(i,j+n-1) = s;
        s = 0.0;
      }}}

  if(margin==2){

    for(int i = 0; i < x.ncol(); ++i){
      for(int j = 0; j < x.nrow()-n+1;++j){
        for(int q=j; q<j+n;q++){
          s+=x(i,q);
        }
        result(j+n-1,i) = s;
        s = 0.0;
      }}}

  return result;
}

基准

正如 Cole 所懷疑的那樣，新函數分配的內存是原始函數的一半，但令人驚訝的是它慢了 3 倍。

表達式	分鍾	lq	意思	中位數	uq	最大限度	內瓦爾	CLD
matrix_rollsum	9.317332	10.84904	15.47414	13.75330	16.36336	101.6147	1000	一種
測試	34.498511	40.08057	47.49839	43.26564	48.34093	211.3246	1000	乙

# A tibble: 2 x 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 matrix_rollsum   9.15ms   10.1ms      93.7   15.31MB    33.4 
2 test             34.1ms   35.4ms      27.5    7.63MB     3.93

Answer 5

也許，您可以嘗試在矩陣 row-wise 上使用apply ：

apply(A, 1, zoo::rollsumr, 3, fill = NA)
#Or
#apply(A, 1, roll::roll_sum, 3)

但是，請注意，這將為您提供列順序格式的輸出。 例如，

A <- matrix(1:10, ncol = 5)
apply(A, 1, zoo::rollsumr, 3, fill = NA)

#     [,1] [,2]
#[1,]   NA   NA
#[2,]   NA   NA
#[3,]    9   12
#[4,]   15   18
#[5,]   21   24

Answer 6

提供的兩個答案在這里都一樣好。 您是否正在尋找列或行的滾動總和，或者您的輸出是否應該通過設計轉置，這個問題似乎有點混亂。 如果您正在尋找后者，我建議您查看 Cole 的答案並反轉輸出矩陣的維度和索引。

也就是說，如果您正在尋找的是列操作和輸出，您可以簡單地使用data.table包中的froll*函數，該函數專為速度和內存效率而設計。

mat <- matrix(rnorm(1e8), ncol = 10))
frollsum = frollsum(mat, 3)

但是，我相信roll庫的性能有些相似。

Answer 7

這是使用embed滾動總和的基本 R 選項

out <- NA * A
out[, -(1:2)] <- t(sapply(1:nrow(A), function(k) rowSums(embed(A[k, ], 3))))

或者

out <- NA * A
u <- embed(t(A), 3)
out[, -(1:2)] <- sapply(rev(split(1:ncol(u), ceiling(seq(ncol(u)) / nrow(A)))), function(k) colSums(u[, k]))

如何對R中的列進行滾動求和？

問題描述

7 個解決方案

解決方案1
5 2020-12-03 01:38:14

解決方案2
4 2020-12-04 16:17:51

解決方案3
3 2020-12-03 15:58:23

解決方案4
3 2020-12-09 12:47:02

按列或行滾動總和

按列或行滾動總和的`Rcpp`函數

基准

內存分配基准

基准圖

編輯

基准

解決方案5
1 2020-11-25 06:45:30

解決方案6
1 2020-12-03 08:50:50

解決方案7
0 2020-12-10 00:01:31

如何對R中的列進行滾動求和？

問題描述

7 個解決方案

解決方案1 5 2020-12-03 01:38:14

解決方案2 4 2020-12-04 16:17:51

解決方案3 3 2020-12-03 15:58:23

解決方案4 3 2020-12-09 12:47:02

按列或行滾動總和

按列或行滾動總和的Rcpp函數

基准

內存分配基准

基准圖

編輯

基准

解決方案5 1 2020-11-25 06:45:30

解決方案6 1 2020-12-03 08:50:50

解決方案7 0 2020-12-10 00:01:31

解決方案1
5 2020-12-03 01:38:14

解決方案2
4 2020-12-04 16:17:51

解決方案3
3 2020-12-03 15:58:23

解決方案4
3 2020-12-09 12:47:02

按列或行滾動總和的`Rcpp`函數

解決方案5
1 2020-11-25 06:45:30

解決方案6
1 2020-12-03 08:50:50

解決方案7
0 2020-12-10 00:01:31