I have the following xts object.
x <- structure(c(30440.5, 30441, 30441.5, 30441.5, 30441, 30439.5, 30440.5, 30441,
30441.5, NA, NA, 30439.5, NA, NA, NA, 30441.5, 30441, NA), .indexTZ = "",
class = c("xts", "zoo"), .indexCLASS = c("POSIXct", "POSIXt"),
tclass = c("POSIXct", "POSIXt"), tzone = "",
index = structure(c(1519866931.1185, 1519866931.1255, 1519866931.1255,
1519866931.1905, 1519866931.1905, 1519866931.1915),
tzone = "", tclass = c("POSIXct", "POSIXt")),
.indexFormat = "%Y-%m-%d %H:%M:%OS",
.Dim = c(6L, 3L), .Dimnames = list(NULL, c("x", "y", "z")))
# x y z
# 2018-03-01 09:15:31.118 30440.5 30440.5 NA
# 2018-03-01 09:15:31.125 30441.0 30441.0 NA
# 2018-03-01 09:15:31.125 30441.5 30441.5 NA
# 2018-03-01 09:15:31.190 30441.5 NA 30441.5
# 2018-03-01 09:15:31.190 30441.0 NA 30441.0
# 2018-03-01 09:15:31.191 30439.5 30439.5 NA
How can I write the vapply
to obtain the mean across rows with mean(..., na.rm = TRUE)
such that it returns a single column like this?
w
2018-03-01 09:15:31.118 30440.5
2018-03-01 09:15:31.125 30441.0
2018-03-01 09:15:31.125 30441.5
2018-03-01 09:15:31.190 30441.5
2018-03-01 09:15:31.190 30441.0
2018-03-01 09:15:31.191 30439.5
I just couldn't get it working.
I am noticing that a lot of answers recommend me not to use vapply
and use other functions instead. However, according to this answer , vapply
is actually the fastest. So which apply
function is the best here ?
I would not use vapply
if you want the mean of the columns for each row. I would use rowMeans
, and note that you have to convert the result back to xts.
(xmean <- xts(rowMeans(x, na.rm = TRUE), index(x)))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
And I would use apply
for a generic function that doesn't have a specialized implementation. Note that you will need to transpose the result if the function returns more than one value.
(xmin <- as.xts(apply(x, 1, min, na.rm = TRUE), dateFormat = "POSIXct"))
# [,1]
# 2018-02-28 19:15:31 30440.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.5
# 2018-02-28 19:15:31 30441.0
# 2018-02-28 19:15:31 30439.5
(xrange <- as.xts(t(apply(x, 1, range, na.rm = TRUE)), dateFormat = "POSIXct"))
# [,1] [,2]
# 2018-02-28 19:15:31 30440.5 30440.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.5 30441.5
# 2018-02-28 19:15:31 30441.0 30441.0
# 2018-02-28 19:15:31 30439.5 30439.5
To address the comment of "why not use vapply()
", here are some benchmarks (using the data from the code review Q/A the OP linked to):
set.seed(21)
xz <- xts(replicate(6, sample(c(1:100), 1000, rep = TRUE)),
order.by = Sys.Date() + 1:1000)
xrowmean <- function(x) { xts(rowMeans(x, na.rm = TRUE), index(x)) }
xapply <- function(x) { as.xts(apply(x, 1, mean, na.rm = TRUE), dateFormat = "POSIXct") }
xvapply <- function(x) { xts(vapply(seq_len(nrow(x)), function(i) {
mean(x[i,], na.rm = TRUE) }, FUN.VALUE = numeric(1)), index(x)) }
library(microbenchmark)
microbenchmark(xrowmean(xz), xapply(xz), xvapply(xz))
# Unit: microseconds
# expr min lq mean median uq max neval
# xrowmean(xz) 169.496 188.8505 207.1931 204.2455 219.4945 285.329 100
# xapply(xz) 33477.542 34203.3260 35698.0503 35076.4655 36821.1320 43910.353 100
# xvapply(xz) 32709.238 35010.1920 37514.7557 35884.3585 37972.7085 84409.961 100
So, why not use vapply()
? It doesn't add much in the way of performance benefit. It's quite a bit more verbose than the apply()
version, and it's not clear there's much benefit to the safety of the 'pre-specified return value' if you have control over the type of object and the function being called. That said, you're not going to do any harm by using vapply()
. I simply prefer apply()
for this case.
You could transpose it and call vapply:
xxx_row_means <- vapply(
as.data.frame(t(xxx)),
function(x) mean(x, na.rm = T),
FUN.VALUE = numeric(length = 1L)
)
Or you could simply use rowMeans() function:
xxx_row_means <- rowMeans(xxx)
Hope that works.
However the easier and faster way to do this is to just use the normal apply
function.
UPDATE: rowMeans
is much faster
fun1<-function(){
vapply(as.data.frame(t(xxx)), mean, na.rm=TRUE, FUN.VALUE = numeric(length = 1L))
}
fun2<-function(){
apply(xxx,1,mean,na.rm=TRUE)
}
fun3<-function(){
rowMeans(xxx,na.rm=TRUE)
}
microbenchmark::microbenchmark(fun1(),fun2(),fun3())
Unit: microseconds
expr min lq mean median uq max neval
fun1() 288.396 303.4080 413.70495 341.1360 380.6420 5039.409 100
fun2() 242.173 253.6300 327.49453 266.6665 319.0125 3305.878 100
fun3() 7.506 10.6665 38.83471 18.7655 23.7035 1950.025 100
In any way to get the desired output I would call the result w
and create a dataframe with data.frame(dttm<-index(xxx),w)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.