简体   繁体   English

如何使用其他两个变量给出的范围标准来计算列平均值?

[英]How to calculate column average with range criteria given by two other variables?

Below is a sample dataset. 以下是样本数据集。

id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")

X1 <- runif(n=4, min=1, max=10)
X2 <- runif(n=4, min=1, max=10)
X3 <- runif(n=4, min=1, max=10)
X4 <- runif(n=4, min=1, max=10)
X5 <- runif(n=4, min=1, max=10)
X6 <- runif(n=4, min=1, max=10)
X7 <- runif(n=4, min=1, max=10)
X8 <- runif(n=4, min=1, max=10)
X9 <- runif(n=4, min=1, max=10)
X10 <- runif(n=4, min=1, max=10)
X11 <- runif(n=4, min=1, max=10)
X12 <- runif(n=4, min=1, max=10)

df <- data.frame(id,start,end,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12)

colnames(df)<-c("id","start","end","Jan 2001","Feb 2001","Mar 2001","Apr 2001","May 2001","Jun 2001",
            "Jul 2001","Aug 2001","Sep 2001","Oct 2001","Nov 2001","Dec 2001")

df
  id    start      end Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001
1  1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2  2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3  3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4  4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
  Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001
1 7.393482 1.865180 5.316736 6.737959 8.783017
2 7.816893 4.021888 7.086448 1.728219 1.553020
3 5.443161 7.489278 9.848638 7.072435 1.294177
4 8.853365 8.899155 5.768139 1.414094 2.322848

I would like to calculate the column mean for each id, from respective start to end month (including the start and end). 我想计算每个id的列均值,从相应的开始到结束月份(包括开始和结束)。 Eg 例如

id start    end        average
2  Jun 2001 Sep 2001   average of Jun, Jul, Aug and Sep 2001

My first thought is to assign index to each month. 我的第一个想法是为每个月分配索引。 So that there is no need to deal with the yearmon data format. 这样就无需处理yearmon数据格式。 Seems to make it easier. 似乎使其变得更容易。

# generate index for month data
df.i <- df
df.i$start.i[df.i$start == "Jan 2001"] <- 1
df.i$start.i[df.i$start == "Feb 2001"] <- 2
df.i$start.i[df.i$start == "Mar 2001"] <- 3
df.i$start.i[df.i$start == "Apr 2001"] <- 4
df.i$start.i[df.i$start == "May 2001"] <- 5
df.i$start.i[df.i$start == "Jun 2001"] <- 6
df.i$start.i[df.i$start == "Jul 2001"] <- 7
df.i$start.i[df.i$start == "Aug 2001"] <- 8
df.i$start.i[df.i$start == "Sep 2001"] <- 9
df.i$start.i[df.i$start == "Oct 2001"] <- 10
df.i$start.i[df.i$start == "Nov 2001"] <- 11
df.i$start.i[df.i$start == "Dec 2001"] <- 12

df.i$end.i[df.i$end == "Jan 2001"] <- 1
df.i$end.i[df.i$end == "Feb 2001"] <- 2
df.i$end.i[df.i$end == "Mar 2001"] <- 3
df.i$end.i[df.i$end == "Apr 2001"] <- 4
df.i$end.i[df.i$end == "May 2001"] <- 5
df.i$end.i[df.i$end == "Jun 2001"] <- 6
df.i$end.i[df.i$end == "Jul 2001"] <- 7
df.i$end.i[df.i$end == "Aug 2001"] <- 8
df.i$end.i[df.i$end == "Sep 2001"] <- 9
df.i$end.i[df.i$end == "Oct 2001"] <- 10
df.i$end.i[df.i$end == "Nov 2001"] <- 11
df.i$end.i[df.i$end == "Dec 2001"] <- 12


colnames(df.i)<-c("id","start","end","1","2","3","4","5","6",
            "7","8","9","10","11","12","start.i","end.i")


 df.i
  id    start      end        1        2        3        4        5        6        7
1  1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2  2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3  3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4  4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
          8        9       10       11       12 start.i end.i
1 7.393482 1.865180 5.316736 6.737959 8.783017       7     8
2 7.816893 4.021888 7.086448 1.728219 1.553020       6     9
3 5.443161 7.489278 9.848638 7.072435 1.294177       5     7
4 8.853365 8.899155 5.768139 1.414094 2.322848       5    11

Thank you. 谢谢。

index.r<-1
for (index.r in 1:nrow(df.i)){
  df.i$mean[index.r] <-     apply(df.i[index.r,as.character(which(as.numeric(colnames(df.i[index.r, yearmonlist]))>=df.i$start.i[index.r] 
              & as.numeric(colnames(df.i[index.r, yearmonlist]))<=df.i$end.i[index.r]))], 1, mean)

}

This one seems to work. 这似乎工作。

Your data, set seed for reproducibility. 您的数据,为可重复性设置种子。

id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
set.seed(123)
df <- data.frame(id, start, end, matrix(runif(n=4*12, min=1, max=10), ncol=12))
df$start <- as.character(df$start)
df$end <- as.character(df$end)
colnames(df)<-c("id", "start", "end", paste(month.abb, 2001))

You can try apply. 您可以尝试申请。 This will "loop" through every row, subsetting by the names of start and end. 这将“循环”遍历每一行,并以开始和结束的名称为子集。 Important, start & end names must match the colnames of df. 重要提示,开始和结束名称必须与df的名称相同。 And finally the mean is calculated over the subset. 最后,对子集计算平均值。

apply(df, 1, function(x, y) mean(as.numeric(x[which(y == x[2]):which(y == x[3])])), colnames(df))
[1] 5.251895 6.273809 5.537480 6.815905

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用基于另外两个变量的该列的平均值来填充原始数据帧的子集的给定列中的NA - How to fill NAs in a given column of a subset of the original dataframe with the average of that column based on two other variables 在给定值范围内按列求平均值 - average by column within a given range of values 计算月份的平均值并替换其他列的值 - Calculate average of month and replace values of other column 如何根据其他列的标准将一列分为两部分 - How to separate one column into two based on the criteria of other columns 如何计算平均值和std。 dev是否符合条件的所有行? - How to calculate average and std. dev for all rows matching a criteria? 根据其他列的月份范围条件聚合列 - Aggregate a column based on month range criteria from other column 使用rollapply和zoo来计算一列变量的滚动平均值 - use rollapply and zoo to calculate rolling average of a column of variables 在给定 R 中的其他标准的情况下,如何对特定列中的特定值求和? - How do I sum a specific value from a particular column given other criteria in R? 计算 R 中两个变量的平均比率的最佳方法是什么? - What are the best ways to calculate the average ratio of two variables in R? 如何计算 r 中两年的移动平均线 - How to calculate moving average for two years in r
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM