[英]a faster way of running sapply in a for loop
I'm trying to find a faster way to run a function, which is looking for the median value for every given day in a time period. 我正在尝试找到一种运行函数的更快方法,该函数正在寻找一段时间内每一天的中位数。 Is there a faster way than running Sapply in a for loop?
有没有比在for循环中运行Sapply更快的方法?
for(z in unique(as.factor(df$group))){
all[[z]]<- sapply(period, function(x) median(df[x == df$date & df$group==z, 'y']))
}
Sample data: 样本数据:
date<-as.Date("2011-11-01") +
runif( 1000,
max=as.integer(
as.Date( "2012-12-31") -
as.Date( "2011-11-01")))
period<-as.Date(min(df$date):max(df$date), origin = "1970-01-01")
df <- data.frame(date=date, y = rnorm(1000), group=factor(rep(letters[1:4], each=250)))
If I understand right, you want to split by group
and then calculate the median
within each date
. 如果我理解正确,您想
split by group
,然后计算每个date
内的median
。 Here's a data.table
solution. 这是一个
data.table
解决方案。
Edit: The problem was with the date
format of your dataset. 编辑:问题在于您的数据集的
date
格式。 It seems to report the number of unique elements wrong. 似乎报告错误的唯一元素数量。 So, I had to recast it to
POSIXct
format. 因此,我不得不将其重铸为
POSIXct
格式。
df$date <- as.POSIXct(as.character(df$date), format="%Y-%m-%d")
require(data.table)
dt <- data.table(df)
setkey(dt, "date")
dt.out <- dt[, lapply(letters[1:4],
function(x) median(y[group == x])), by = date]
This is identical to Victor's output. 这与Victor的输出相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.