简体   繁体   中英

Making comparisons of a m-dimensional array and an (m-1)-dimensional array repeated along an arbitrary dimension

I have implemented a multidimensional-array based calculation that replaces some looping code. I did a few things in this process which I think could be done better - but I'm not sure how.

One of those is comparing a resulting 3d array to a 2d array repeated along the third dimension.

items12 = c(1,2,3,4,5,6)
items3 = c(1,2,3)

m2d = outer(items12, items12, "-")
m3d = outer(items3, m2d, "*")

After some manipulation I want to compare m2d and m3d having m2d repeated along the third dim. I know of two options, neither seem that elegant and I was curious if there was a better way.

Instantiate the repeated, 3d array. Memory heavy but fast.

m2d.z.3d = outer(
  m2d, 
  rep(1, length(items3)), "*"
)

m3d - m2d.z.3d

Loop. Light but slow.

apply(m3d, 3, function(x) {
    x - m2d
})

Any suggestions? Which would you choose?

Update Example clarifying the arbitrary index requirement.

items12 = c(1,2,3)
items3 = c(1,2)

m2d = outer(items12, items12, "-")
m3d = outer(m2d,items3, "*")

m3d - (m3d - items.3)

# items.3 wrapped along rows
, , 1

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    1    2    3

, , 2

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    1    2    3

m3d.yx = aperm(m3d, c(2,1,3))
aperm(m3d.yx - (m3d.yx - c(items.3)), c(2,1,3)) 

#items.3 wrapped around columns
, , 1

     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    2    2    2
[3,]    3    3    3

, , 2

     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    2    2    2
[3,]    3    3    3

Update

Some benchmarks of aperm in this situation.

items.3 = rep(c(1,2,3), n)
items.2 = rep(c(1,2), n)

m2d = outer(items.3, items.3, "-")
m3d = outer(m2d, items.2, "*")

funRecycle = function() # items.3 wraps around the columns (index 1, then 2, then 3 etc.)
  m3d - (m3d - c(items.3)) 
funAperm = function() { # temporarily interchange index 1 and 2 to apply along desired index
  m3d.yx = aperm(m3d, c(2,1,3))
  aperm(m3d.yx - (m3d.yx - c(items.3)), c(2,1,3)) 
}
funOuter = function() { # assign the 3d matrix
  m2d.z.3d = outer(
    m2d, 
    rep(1, length(items.2)), "*"
  )
  m3d - m2d.z.3d
}
funArray = function() { # assign the 3d matrix with array
  m2d.z.3d = array(m2d, dim=c(dim(m2d)[1:2], length(items.2)))
  m3d - m2d.z.3d
}
funSweep <- function() sweep(m3d, c(1, 2), m2d, "-")

n = 1

Unit: microseconds
         expr    min      lq     mean  median      uq    max neval   cld
 funRecycle()  1.110  1.3875  1.65388  1.6650  1.9420  2.775   100 a    
   funAperm() 17.200 19.1420 21.23113 20.2520 20.9455 69.077   100    d 
   funOuter() 14.426 15.8130 17.58316 17.2005 18.1710 35.232   100   c  
   funArray()  2.774  3.3300  3.95079  3.8840  4.1610 14.148   100  b   
   funSweep() 31.903 32.7360 34.84129 33.5680 34.4000 62.141   100     e

n=100

Unit: milliseconds
         expr       min        lq      mean    median        uq       max
 funRecycle()  28.51351  32.35671  37.13257  33.98931  39.94408  85.94085
   funAperm() 232.69297 276.07494 344.70083 352.40273 395.50492 569.54978
   funOuter()  35.25947  43.98674  53.06895  49.72790  55.93677  95.38608
   funArray()  96.78482 110.10501 119.68267 116.50378 120.70943 172.53973
   funSweep() 150.88675 168.90293 193.06270 178.11013 216.79349 291.23719

I'm surprised by the results, somehow, at large n, multiplying everything by 1 with outer becomes faster than simply replicating the array with array(). (At large n outer() looks like it might become faster than even the recycle approach).

It appears if we have to make a comparison across a different index (funAperm) building the array with outer will be much faster in all cases.

Any suggestions aside from aperm to make this comparison across an arbitrary index?

Assuming that you meant (I'm assuming this because otherwise m3d - m2d.z.3d doesn't work):

m3d = outer(m2d, items3, "*") # note how I switched the arguments

Then this works:

m3d - c(m2d)

To prove it:

all.equal(m3d - c(m2d), m3d - m2d.z.3d)
# [1] TRUE

Here we just take advantage of vector recycling since we want to repeat along the last dimension. We need to do c() to get rid of the dimensions otherwise R will complain the arrays are not comformable (though they actually are in the specific sense we want here).

Based on a perfunctory review of the R source code ( src/main/arithmetic.c:real_binary() )it looks like vector recycling does not duplicate the recycled vector, so this should be both fast and memory efficient.

If we wanted to do this along arbitrary dimensions we would have to reshuffle the array with all dimensions with aperm to make the relevant dimension last, and then reshuffle the result back to the original dimension order. This would add some overhead.

As to what method to chose, if you are not running out of memory, go with the fast method (ie avoid the loop in favor of fully vectorized operations).

Also, some benchmarks with items12 <- seq(100) and items3 <- seq(50) :

funOuter <- function() {
  m2d.z.3d = outer(
    m2d, 
    rep(1, length(items3)), "*"
  )
  m3d - m2d.z.3d
}
funRecycle <- function() m3d - c(m2d)
funLoop <- function() apply(m3d, 3, "-", m2d)    # this does not appear correct because `apply` doesn't reconstruct dimensions like `sapply`
funSweep <- function() sweep(m3d, c(1, 2), m2d)  # this is the same type of thing but works properly

library(microbenchmark)
microbenchmark(funOuter(), funRecycle(), funLoop(), funSweep())

Produces:

Unit: milliseconds
         expr       min        lq      mean    median
   funOuter()  2.297287  2.673768  3.232277  2.835404
 funRecycle()  1.327101  1.485082  2.093252  1.599543
    funLoop() 22.579010 24.586667 27.211804 26.840069
   funSweep() 11.251656 12.012664 13.516147 13.736908       

And check results:

all.equal(funOuter(), funRecycle())
# [1] TRUE
all.equal(funOuter(), funSweep())
# [1] TRUE
all.equal(funOuter(), funLoop())
# Nope, not equal

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM