简体   繁体   English

如何在没有循环或mapply的R中的向量的不同间隔上使用相同的函数?

[英]How to use the same function on different intervals of a vector in R without loops or mapply?

Suppose I have a data frame such as 假设我有一个数据框,如

         Date    Value
1  2014-04-14   830.61
2  2014-04-11   815.69
3  2014-04-10   833.08
4  2014-04-09   872.18
5  2014-04-08   851.96
6  2014-04-07   845.04
7  2014-04-04   865.09
8  2014-04-03   888.77
9  2014-04-02   890.90
10 2014-04-01   885.52

Let's name it DF. 我们把它命名为DF。 And suppose I have defined min and max values of the index number. 假设我已经定义了索引号的最小值和最大值。

minvals<-c(1,2,3)
maxvals<-c(5,7,10)

I want to process a function (ie mean value or standard deviation of Value column) for each interval. 我想为每个区间处理一个函数(即值列的平均值或标准差)。 For example, take the mean of the first interval. 例如,取第一个间隔的平均值。

DF[minvals[1]:maxvals[1],"Value"]

         Date    Value
1  2014-04-14   830.61
2  2014-04-11   815.69
3  2014-04-10   833.08
4  2014-04-09   872.18
5  2014-04-08   851.96

mean(DF[minvals[1]:maxvals[1],"Value"])
#840.704

also for other minvals and maxvals. 也适用于其他小型和小型。 The first thing that comes to mind is mapply. 首先想到的是mapply。 But as my data has minvals and maxvals with thousands of this. 但是因为我的数据包含了数千个这样的小数和最大值。 Is it possible to do it in an efficient way? 是否有可能以有效的方式做到这一点?

ps In fact, it is quite similar to rolling mean but my date column include only workdays so I am not sure if rollmean function of zoo package can take care of this. ps实际上,它与滚动平均值非常相似,但我的日期列仅包括工作日,因此我不确定zoo包的rollmean函数是否可以处理这个问题。 Anyway suppose my time intervals are not regular also. 无论如何,假设我的时间间隔也不规律。

Try data.table 试试data.table

DFvec <- DF$Value
Ints <- data.frame(MIN = c(1,2,3), MAX = c(5,7,10))
library(data.table)
setDT(Ints)[, MEAN := mean(DFvec[MIN:MAX]), by = c("MIN", "MAX")]
Ints
##    MIN MAX     MEAN
## 1:   1   5 840.7040
## 2:   2   7 847.1733
## 3:   3  10 866.5675

Another way: 其他方式:

minvals = as.integer(minvals)
maxvals = as.integer(maxvals)
lenvals = maxvals - minvals + 1L
ix  = data.table:::vecseq(minvals, lenvals, sum(lenvals))
grp = rep(seq_along(lenvals), lenvals)

setDT(DF[ix, ])[, list(Value=mean(Value)), by=grp]
#    grp    Value
# 1:   1 840.7040
# 2:   2 847.1733
# 3:   3 866.5675

Here is the mapply solution. 这是mapply解决方案。 If that is too slow (give a reproducible example of you problem size), you could probably do something clever with data.table or use Rcpp. 如果这太慢了(给出一个可重现的问题大小示例),你可以用data.table做一些聪明的事情或使用Rcpp。

x <- DF[["Value"]] #avoid data.frame subsetting in a loop
mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals)

Benchmarks with 1e5 intervals: 1e5间隔的基准:

library(microbenchmark)
set.seed(42)
i <- sample(1:3, 1e5, TRUE)
minvals<-c(1,2,3)[i]
maxvals<-c(5,7,10)[i]
microbenchmark(mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals), times=10)

Unit: milliseconds
                                                             expr      min       lq   median       uq      max neval
mapply(function(i1, i2) mean.default(x[i1:i2]), minvals, maxvals) 446.0529 473.4267 489.2375 523.2335 595.5536    10

Here are a several approaches. 这是几种方法。 Its not clear from the description that efficiency is really important here and readability might be more important: 从描述中不清楚效率在这里真的很重要,可读性可能更重要:

# they all use this:
DF.Value <- DF$Value

# 1
sapply(seq_along(minvals), function(i) mean(DF.Value[minvals[i]:maxvals[i]]))

# 2
f <- function(minvals, maxvals) mean(DF.Value[minvals:maxvals])
mapply(f, minvals, maxvals)

# 3 - this one assumes that minvals equals seq_along(minvals) which is true in example
library(zoo)
w <- maxvals - minvals + 1
rollapply(DF.Value, w, mean, align = "left")[minvals]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM