[英]How to calculate column average with range criteria given by two other variables?
Below is a sample dataset. 以下是样本数据集。
id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
X1 <- runif(n=4, min=1, max=10)
X2 <- runif(n=4, min=1, max=10)
X3 <- runif(n=4, min=1, max=10)
X4 <- runif(n=4, min=1, max=10)
X5 <- runif(n=4, min=1, max=10)
X6 <- runif(n=4, min=1, max=10)
X7 <- runif(n=4, min=1, max=10)
X8 <- runif(n=4, min=1, max=10)
X9 <- runif(n=4, min=1, max=10)
X10 <- runif(n=4, min=1, max=10)
X11 <- runif(n=4, min=1, max=10)
X12 <- runif(n=4, min=1, max=10)
df <- data.frame(id,start,end,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12)
colnames(df)<-c("id","start","end","Jan 2001","Feb 2001","Mar 2001","Apr 2001","May 2001","Jun 2001",
"Jul 2001","Aug 2001","Sep 2001","Oct 2001","Nov 2001","Dec 2001")
df
id start end Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001
1 1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2 2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3 3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4 4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001
1 7.393482 1.865180 5.316736 6.737959 8.783017
2 7.816893 4.021888 7.086448 1.728219 1.553020
3 5.443161 7.489278 9.848638 7.072435 1.294177
4 8.853365 8.899155 5.768139 1.414094 2.322848
I would like to calculate the column mean for each id, from respective start to end month (including the start and end). 我想计算每个id的列均值,从相应的开始到结束月份(包括开始和结束)。 Eg
例如
id start end average
2 Jun 2001 Sep 2001 average of Jun, Jul, Aug and Sep 2001
My first thought is to assign index to each month. 我的第一个想法是为每个月分配索引。 So that there is no need to deal with the yearmon data format.
这样就无需处理yearmon数据格式。 Seems to make it easier.
似乎使其变得更容易。
# generate index for month data
df.i <- df
df.i$start.i[df.i$start == "Jan 2001"] <- 1
df.i$start.i[df.i$start == "Feb 2001"] <- 2
df.i$start.i[df.i$start == "Mar 2001"] <- 3
df.i$start.i[df.i$start == "Apr 2001"] <- 4
df.i$start.i[df.i$start == "May 2001"] <- 5
df.i$start.i[df.i$start == "Jun 2001"] <- 6
df.i$start.i[df.i$start == "Jul 2001"] <- 7
df.i$start.i[df.i$start == "Aug 2001"] <- 8
df.i$start.i[df.i$start == "Sep 2001"] <- 9
df.i$start.i[df.i$start == "Oct 2001"] <- 10
df.i$start.i[df.i$start == "Nov 2001"] <- 11
df.i$start.i[df.i$start == "Dec 2001"] <- 12
df.i$end.i[df.i$end == "Jan 2001"] <- 1
df.i$end.i[df.i$end == "Feb 2001"] <- 2
df.i$end.i[df.i$end == "Mar 2001"] <- 3
df.i$end.i[df.i$end == "Apr 2001"] <- 4
df.i$end.i[df.i$end == "May 2001"] <- 5
df.i$end.i[df.i$end == "Jun 2001"] <- 6
df.i$end.i[df.i$end == "Jul 2001"] <- 7
df.i$end.i[df.i$end == "Aug 2001"] <- 8
df.i$end.i[df.i$end == "Sep 2001"] <- 9
df.i$end.i[df.i$end == "Oct 2001"] <- 10
df.i$end.i[df.i$end == "Nov 2001"] <- 11
df.i$end.i[df.i$end == "Dec 2001"] <- 12
colnames(df.i)<-c("id","start","end","1","2","3","4","5","6",
"7","8","9","10","11","12","start.i","end.i")
df.i
id start end 1 2 3 4 5 6 7
1 1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2 2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3 3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4 4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
8 9 10 11 12 start.i end.i
1 7.393482 1.865180 5.316736 6.737959 8.783017 7 8
2 7.816893 4.021888 7.086448 1.728219 1.553020 6 9
3 5.443161 7.489278 9.848638 7.072435 1.294177 5 7
4 8.853365 8.899155 5.768139 1.414094 2.322848 5 11
Thank you. 谢谢。
index.r<-1
for (index.r in 1:nrow(df.i)){
df.i$mean[index.r] <- apply(df.i[index.r,as.character(which(as.numeric(colnames(df.i[index.r, yearmonlist]))>=df.i$start.i[index.r]
& as.numeric(colnames(df.i[index.r, yearmonlist]))<=df.i$end.i[index.r]))], 1, mean)
}
This one seems to work. 这似乎工作。
Your data, set seed for reproducibility. 您的数据,为可重复性设置种子。
id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
set.seed(123)
df <- data.frame(id, start, end, matrix(runif(n=4*12, min=1, max=10), ncol=12))
df$start <- as.character(df$start)
df$end <- as.character(df$end)
colnames(df)<-c("id", "start", "end", paste(month.abb, 2001))
You can try apply. 您可以尝试申请。 This will "loop" through every row, subsetting by the names of start and end.
这将“循环”遍历每一行,并以开始和结束的名称为子集。 Important, start & end names must match the colnames of df.
重要提示,开始和结束名称必须与df的名称相同。 And finally the mean is calculated over the subset.
最后,对子集计算平均值。
apply(df, 1, function(x, y) mean(as.numeric(x[which(y == x[2]):which(y == x[3])])), colnames(df))
[1] 5.251895 6.273809 5.537480 6.815905
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.