[英]How to use the same function on different intervals of a vector in R without loops or mapply?
Suppose I have a data frame such as 假设我有一个数据框,如
Date Value
1 2014-04-14 830.61
2 2014-04-11 815.69
3 2014-04-10 833.08
4 2014-04-09 872.18
5 2014-04-08 851.96
6 2014-04-07 845.04
7 2014-04-04 865.09
8 2014-04-03 888.77
9 2014-04-02 890.90
10 2014-04-01 885.52
Let's name it DF. 我们把它命名为DF。 And suppose I have defined min and max values of the index number. 假设我已经定义了索引号的最小值和最大值。
minvals<-c(1,2,3)
maxvals<-c(5,7,10)
I want to process a function (ie mean value or standard deviation of Value column) for each interval. 我想为每个区间处理一个函数(即值列的平均值或标准差)。 For example, take the mean of the first interval. 例如,取第一个间隔的平均值。
DF[minvals[1]:maxvals[1],"Value"]
Date Value
1 2014-04-14 830.61
2 2014-04-11 815.69
3 2014-04-10 833.08
4 2014-04-09 872.18
5 2014-04-08 851.96
mean(DF[minvals[1]:maxvals[1],"Value"])
#840.704
also for other minvals and maxvals. 也适用于其他小型和小型。 The first thing that comes to mind is mapply. 首先想到的是mapply。 But as my data has minvals and maxvals with thousands of this. 但是因为我的数据包含了数千个这样的小数和最大值。 Is it possible to do it in an efficient way? 是否有可能以有效的方式做到这一点?
ps In fact, it is quite similar to rolling mean but my date column include only workdays so I am not sure if rollmean function of zoo package can take care of this. ps实际上,它与滚动平均值非常相似,但我的日期列仅包括工作日,因此我不确定zoo包的rollmean函数是否可以处理这个问题。 Anyway suppose my time intervals are not regular also. 无论如何,假设我的时间间隔也不规律。
Try data.table
试试data.table
DFvec <- DF$Value
Ints <- data.frame(MIN = c(1,2,3), MAX = c(5,7,10))
library(data.table)
setDT(Ints)[, MEAN := mean(DFvec[MIN:MAX]), by = c("MIN", "MAX")]
Ints
## MIN MAX MEAN
## 1: 1 5 840.7040
## 2: 2 7 847.1733
## 3: 3 10 866.5675
Another way: 其他方式:
minvals = as.integer(minvals)
maxvals = as.integer(maxvals)
lenvals = maxvals - minvals + 1L
ix = data.table:::vecseq(minvals, lenvals, sum(lenvals))
grp = rep(seq_along(lenvals), lenvals)
setDT(DF[ix, ])[, list(Value=mean(Value)), by=grp]
# grp Value
# 1: 1 840.7040
# 2: 2 847.1733
# 3: 3 866.5675
Here is the mapply
solution. 这是mapply
解决方案。 If that is too slow (give a reproducible example of you problem size), you could probably do something clever with data.table or use Rcpp. 如果这太慢了(给出一个可重现的问题大小示例),你可以用data.table做一些聪明的事情或使用Rcpp。
x <- DF[["Value"]] #avoid data.frame subsetting in a loop
mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals)
Benchmarks with 1e5 intervals: 1e5间隔的基准:
library(microbenchmark)
set.seed(42)
i <- sample(1:3, 1e5, TRUE)
minvals<-c(1,2,3)[i]
maxvals<-c(5,7,10)[i]
microbenchmark(mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals), times=10)
Unit: milliseconds
expr min lq median uq max neval
mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals) 446.0529 473.4267 489.2375 523.2335 595.5536 10
Here are a several approaches. 这是几种方法。 Its not clear from the description that efficiency is really important here and readability might be more important: 从描述中不清楚效率在这里真的很重要,可读性可能更重要:
# they all use this:
DF.Value <- DF$Value
# 1
sapply(seq_along(minvals), function(i) mean(DF.Value[minvals[i]:maxvals[i]]))
# 2
f <- function(minvals, maxvals) mean(DF.Value[minvals:maxvals])
mapply(f, minvals, maxvals)
# 3 - this one assumes that minvals equals seq_along(minvals) which is true in example
library(zoo)
w <- maxvals - minvals + 1
rollapply(DF.Value, w, mean, align = "left")[minvals]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.