简体   繁体   English

如何计算两个日期之间的变量平均值

[英]how to calculate the mean of a variable between two date

I would like to calculate the mean of a variable between two date, below is the reproducible data frame. 我想计算两个日期之间的变量平均值,以下是可重现的数据框。

year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
         "B","B","B","B","B","B","B","B","B","B","B","B")

concentration <- as.numeric(round(runif(48,20,40),1))

df <- data.frame(year,month,station,concentration)


id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")

participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")

So I have two dataset as below 所以我有两个数据集如下

df
   year month station concentration
1  1996   JAN       A          24.4
2  1996   FEB       A          37.0
3  1996   MAR       A          39.5
4  1996   APR       A          28.0
...
45 1997   SEP       B          37.7
46 1997   OCT       B          35.2
47 1997   NOV       B          26.8
48 1997   DEC       B          40.0

participant
  id station1996 station1997      start        end
1  1           A           B 1996-06-01 1997-04-01
2  2           A           A 1996-07-01 1997-04-01
3  3           B           A 1996-07-01 1997-04-01
4  4           B           B 1996-08-01 1997-05-01

For each id, I would like to calculate the average concentration between the start and end date (month year). 对于每个ID,我想计算开始日期和结束日期(月份)之间的平均浓度。 Noted that the station might change between years. 注意,该站可能会在几年之间变化。

For example for id=1, I would like to calculate the average concentration between JUN 1996 AND APR 1997. This should be based on the concentration from JUN 1996 to DEC 1996 at station A, and JAN 1997 to APR 1997 at station B. 例如,对于id = 1,我想计算1996年6月到1997年4月之间的平均浓度。这应该基于A站从1996年6月到1996年12月以及B站从1997年1月到1997年4月的浓度。

Can anyone help? 有人可以帮忙吗?

Thank you very much. 非常感谢你。

Here's a data.table solution. 这是一个data.table解决方案。 The basic idea is to enumerate all the dates in the start-end range as yearmon , for each id , and then use that as an index into the concentration table df . 基本思想是将每个id的开始-结束范围内的所有日期枚举为yearmon ,然后将其用作浓度表df的索引。 It's a bit convoluted so hopefully someone will come along and show you a simpler way. 这有点令人费解,所以希望有人会来给您展示一种更简单的方法。

library(data.table)
library(zoo)          # for as.yearmon(...)
setDT(df)             # convert to data.table
setDT(participant)
df[, yrmon:= as.yearmon(paste(year,month,sep="-"), format="%Y-%B")]   # add year-month column
p.melt <- reshape(participant, varying=2:3, direction="long", sep="", timevar="year")
x <- participant[, .(date=seq(start,end,by="month")), by=id]
x[, c("year","yrmon"):=.(year(date),as.yearmon(date))]           # add year and year-month
x[p.melt, station:=station, on=c("id","year")]                   # add station
x[df, conc:= concentration, on=c("yrmon","station"), nomatch=0]  # add concentration
setorder(x,id)    # not necessary, but makes it easier to interpret x
result <- x[, .(mean.conc=mean(conc)), by=id]                    # mean(conc) by id
result
#    id mean.conc
# 1:  1  28.61818
# 2:  2  28.56000
# 3:  3  28.44000
# 4:  4  29.60000

So, first we convert everything to data.tables. 因此,首先我们将所有内容都转换为data.tables。 Then we add a yrmon column to df for indexing later. 然后,我们将yrmon列添加到df以便以后进行索引。 Then we create p.melt by reshaping participant to long format, so that the station is in one column and the indicator (1996 or 1997) is in a separate column. 然后,我们通过将participant重塑为长格式来创建p.melt ,以便站点位于一列中,而指标(1996或1997)位于单独的列中。 Then we create a temporary table, x with the sequence of dates for each id , and add the year and yrmon for each of those dates. 然后,我们创建一个临时表x ,其中每个id带有日期序列,并为每个日期添加year和yrmon。 Then we merge that with p.melt on id and year to add a station column to x . 然后,将其与idyear p.melt合并,以将桩号列添加到x Then we use yrmon and station to merge x with df to get the appropriate concentration. 然后,使用yrmonstationxdf合并以获得适当的浓度。 Then we simply aggregate conc by id in x using mean(...) . 然后我们简单地使用mean(...)x通过id聚合conc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何计算两个数据框中的日期范围内的平均值 - How to calculate mean over date range in two data frames 如何使用dyplr计算所有列的变量的平均值和值之间的差异 - How to calculate the difference between mean and the value of a variable for all columns with dyplr 根据不同的日期列计算两个日期之间的变量平均值 - Calculate average of variable between two dates based on a different date column 如何使用循环或应用函数计算两个日期范围之间变量的平均值? - How do I calculate the average of a variable between two date ranges using a loop or apply function? 如何计算R中两条记录之间的日期差异 - How to calculate Date difference between two records in R 如何计算按日期划分为财政季度的均值 - How to calculate Mean by Date Grouped as Fiscal Quarters 如何计算R中两个时间戳列的平均值? - How to calculate mean of two timestamp columns in R? 汇总日期和计算平均值 - aggregate date and calculate mean 计算一个变量的加权平均值,它有两种类型,R - Calculate the weighted mean for a variable, which has two types with R 如何计算两个POSIXct日期数组之间特定工作日的数量并返回另一个数值数组? - How to calculate the number of a specific weekday between two POSIXct date arrays and return another numerical array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM