简体   繁体   中英

R: how to vapply across rows for xts object?

I have the following xts object.

x <- structure(c(30440.5, 30441, 30441.5, 30441.5, 30441, 30439.5, 30440.5, 30441,
                 30441.5, NA, NA, 30439.5, NA, NA, NA, 30441.5, 30441, NA), .indexTZ = "",
               class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"), 
               tclass = c("POSIXct", "POSIXt"), tzone = "", 
               index = structure(c(1519866931.1185, 1519866931.1255, 1519866931.1255, 
                                   1519866931.1905, 1519866931.1905, 1519866931.1915), 
                                 tzone = "", tclass = c("POSIXct", "POSIXt")), 
               .indexFormat = "%Y-%m-%d %H:%M:%OS",
               .Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "y", "z")))
#                              x        y        z
# 2018-03-01 09:15:31.118  30440.5  30440.5       NA
# 2018-03-01 09:15:31.125  30441.0  30441.0       NA
# 2018-03-01 09:15:31.125  30441.5  30441.5       NA
# 2018-03-01 09:15:31.190  30441.5       NA  30441.5
# 2018-03-01 09:15:31.190  30441.0       NA  30441.0
# 2018-03-01 09:15:31.191  30439.5  30439.5       NA

How can I write the vapply to obtain the mean across rows with mean(..., na.rm = TRUE) such that it returns a single column like this?

                               w       
2018-03-01 09:15:31.118  30440.5
2018-03-01 09:15:31.125  30441.0 
2018-03-01 09:15:31.125  30441.5
2018-03-01 09:15:31.190  30441.5 
2018-03-01 09:15:31.190  30441.0 
2018-03-01 09:15:31.191  30439.5

I just couldn't get it working.

I am noticing that a lot of answers recommend me not to use vapply and use other functions instead. However, according to this answer , vapply is actually the fastest. So which apply function is the best here ?

I would not use vapply if you want the mean of the columns for each row. I would use rowMeans , and note that you have to convert the result back to xts.

(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5

And I would use apply for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.

(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
#                        [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
#                        [,1]    [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5

To address the comment of "why not use vapply() ", here are some benchmarks (using the data from the code review Q/A the OP linked to):

set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
          order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
    mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }

library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
#          expr       min         lq       mean     median         uq       max neval
#  xrowmean(xz)   169.496   188.8505   207.1931   204.2455   219.4945   285.329   100
#    xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353   100
#   xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961   100

So, why not use vapply() ? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply() version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply() . I simply prefer apply() for this case.

You could transpose it and call vapply:

xxx_row_means <- vapply(
  as.data.frame(t(xxx)), 
  function(x) mean(x, na.rm = T), 
  FUN.VALUE = numeric(length = 1L)
)

Or you could simply use rowMeans() function:

xxx_row_means <- rowMeans(xxx)

Hope that works.

However the easier and faster way to do this is to just use the normal apply function.

UPDATE: rowMeans is much faster

fun1<-function(){
  vapply(as.data.frame(t(xxx)), mean,   na.rm=TRUE,  FUN.VALUE = numeric(length = 1L))
}

fun2<-function(){
  apply(xxx,1,mean,na.rm=TRUE)  
}



fun3<-function(){
   rowMeans(xxx,na.rm=TRUE)
 }
microbenchmark::microbenchmark(fun1(),fun2(),fun3())
Unit: microseconds
   expr     min       lq      mean   median       uq      max neval
 fun1() 288.396 303.4080 413.70495 341.1360 380.6420 5039.409   100
 fun2() 242.173 253.6300 327.49453 266.6665 319.0125 3305.878   100
 fun3()   7.506  10.6665  38.83471  18.7655  23.7035 1950.025   100  

In any way to get the desired output I would call the result w and create a dataframe with data.frame(dttm<-index(xxx),w)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM