在for循环中更快地运行sapply的方法

Question

I'm trying to find a faster way to run a function, which is looking for the median value for every given day in a time period. 我正在尝试找到一种运行函数的更快方法，该函数正在寻找一段时间内每一天的中位数。 Is there a faster way than running Sapply in a for loop? 有没有比在for循环中运行Sapply更快的方法？

for(z in unique(as.factor(df$group))){
all[[z]]<- sapply(period, function(x) median(df[x == df$date & df$group==z, 'y']))
}

Sample data: 样本数据：

date<-as.Date("2011-11-01") + 
runif( 1000, 
       max=as.integer( 
           as.Date( "2012-12-31") - 
               as.Date( "2011-11-01")))
period<-as.Date(min(df$date):max(df$date), origin = "1970-01-01")
df <- data.frame(date=date, y = rnorm(1000), group=factor(rep(letters[1:4], each=250)))

Answer 1

If I understand right, you want to split by group and then calculate the median within each date . 如果我理解正确，您想split by group ，然后计算每个date内的median 。 Here's a data.table solution. 这是一个data.table解决方案。

Edit: The problem was with the date format of your dataset. 编辑：问题在于您的数据集的date格式。 It seems to report the number of unique elements wrong. 似乎报告错误的唯一元素数量。 So, I had to recast it to POSIXct format. 因此，我不得不将其重铸为POSIXct格式。

df$date <- as.POSIXct(as.character(df$date), format="%Y-%m-%d")
require(data.table)
dt <- data.table(df)

setkey(dt, "date")
dt.out <- dt[, lapply(letters[1:4], 
          function(x) median(y[group == x])), by = date]

This is identical to Victor's output. 这与Victor的输出相同。

Answer 2

Here is a solution using base R function tapply 这是tapply使用基本R函数的解决方案

tapply(df$y, df$date, median)

Update . 更新。 Judging by your comment above, you need one column for each group? 从上面的评论来看，每个组需要一个专栏吗？ That's also a one-liner: 那也是单线的：

tapply(df$y, list(df$date, df$group), median)

在for循环中更快地运行sapply的方法

问题描述

2 个解决方案

解决方案1
4 2013-01-29 00:40:15

解决方案2
2 已采纳 2013-01-29 01:25:43

在for循环中更快地运行sapply的方法

问题描述

2 个解决方案

解决方案1 4 2013-01-29 00:40:15

解决方案2 2 已采纳 2013-01-29 01:25:43

解决方案1
4 2013-01-29 00:40:15

解决方案2
2 已采纳 2013-01-29 01:25:43