简体   繁体   中英

Calculate rowSums for three-dimensional array without for-loop/apply

Take array b_array :

set.seed(123)
a_mtx = matrix(1:15,ncol=5)
b_mtx = matrix(seq(1,5,length.out=30),ncol=5)

b_array = 
  array(
    b_mtx,
    dim = 
      c(
        nrow(b_mtx),
        ncol(b_mtx), 
        nrow(a_mtx)
      )
    )

If I want to calculate the sum of each column of each "slice" or "sheet" of b_array , I can use colSums with its dimension argument:

colSums(b_array, dim = 1)
#          [,1]      [,2]      [,3]
#[1,]  8.068966  8.068966  8.068966
#[2,] 13.034483 13.034483 13.034483
#[3,] 18.000000 18.000000 18.000000
#[4,] 22.965517 22.965517 22.965517
#[5,] 27.931034 27.931034 27.931034

To do the same for row sums, I cannot use rowSums 's dimension argument, as it is treated differently, so I resort to an apply :

apply(b_array, 3, rowSums)
#         [,1]     [,2]     [,3]
#[1,] 13.27586 13.27586 13.27586
#[2,] 13.96552 13.96552 13.96552
#[3,] 14.65517 14.65517 14.65517
#[4,] 15.34483 15.34483 15.34483
#[5,] 16.03448 16.03448 16.03448
#[6,] 16.72414 16.72414 16.72414

I wish to perform the same calculation on an array with much larger dimension, so that apply and other for-loop methods are not efficient.

Are there any alternative, truly vectorized methods?

The default thinking (I believe) with regards to the MARGIN= (second) argument to apply is that it means "the axis that is reduced" (when aggregating ... simplifying for effect here). However, another way of looking at it is that all other dimensions remain untouched.

The effective equivalent of colSums(ary) , for example, is apply(ary, 2, sum) , meaning "keep axis 1 un-reduced" . ( colSums is actually done internally, not with apply .) So to extend the "all axes except " logic, let's realize for your b_array that you want the 1st and 3rd axes to remain, so doing

apply(b_array, c(1,3), sum)
#          [,1]     [,2]     [,3]
# [1,] 13.27586 13.27586 13.27586
# [2,] 13.96552 13.96552 13.96552
# [3,] 14.65517 14.65517 14.65517
# [4,] 15.34483 15.34483 15.34483
# [5,] 16.03448 16.03448 16.03448
# [6,] 16.72414 16.72414 16.72414

is about as efficient as you can get (I think) in doing a "column" sum with an n -dimensional array.

Edit :

@markus' use of aperm is faster, across a wide range of matrix sizes, though it appears to converge at larger matrices.

ns <- c(10,50,100,1000)
set.seed(123)
arrays <- lapply(ns, function(n) array(runif(3*n*n), dim=c(n,n,3)))

mapply(identical,
       lapply(arrays, function(a) t(colSums(aperm(a, perm = c(2, 3, 1))))),
       lapply(arrays, function(a) apply(a, c(1,3), sum)))
# [1] TRUE TRUE TRUE TRUE

library(microbenchmark)
microbenchmark(
  aperm10 = t(colSums(aperm(arrays[[1]], perm = c(2, 3, 1)))),
  aperm50 = t(colSums(aperm(arrays[[2]], perm = c(2, 3, 1)))),
  aperm100 = t(colSums(aperm(arrays[[3]], perm = c(2, 3, 1)))),
  aperm1000 = t(colSums(aperm(arrays[[4]], perm = c(2, 3, 1)))),
  apply10 = apply(arrays[[1]], c(1,3), sum),
  apply50 = apply(arrays[[2]], c(1,3), sum),
  apply100 = apply(arrays[[3]], c(1,3), sum),
  apply1000 = apply(arrays[[4]], c(1,3), sum),
  times=10
)
# Unit: microseconds
#       expr     min      lq     mean   median      uq     max neval
#    aperm10    19.1    25.5    46.74    39.55    59.2   105.8    10
#    aperm50    55.7    77.2    96.36    94.30   115.6   149.8    10
#   aperm100   231.2   247.2   267.14   258.35   295.5   301.8    10
#  aperm1000 47282.5 47568.4 49235.19 49581.85 50118.4 52034.4    10
#    apply10    53.7    59.1    78.42    63.15   105.6   123.5    10
#    apply50   263.9   282.3   318.08   306.60   366.4   383.0    10
#   apply100   637.7   686.6   712.65   710.75   741.5   799.7    10
#  apply1000 40173.7 52735.7 52170.08 54349.65 55692.9 57375.9    10

(I haven't tested memory use.)

Another option using aperm

t(colSums(aperm(b_array, perm = c(2, 3, 1))))
#         [,1]     [,2]     [,3]
#[1,] 13.27586 13.27586 13.27586
#[2,] 13.96552 13.96552 13.96552
#[3,] 14.65517 14.65517 14.65517
#[4,] 15.34483 15.34483 15.34483
#[5,] 16.03448 16.03448 16.03448
#[6,] 16.72414 16.72414 16.72414

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM