如何计算两个日期之间的变量平均值

Question

I would like to calculate the mean of a variable between two date, below is the reproducible data frame. 我想计算两个日期之间的变量平均值，以下是可重现的数据框。

year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
         "B","B","B","B","B","B","B","B","B","B","B","B")

concentration <- as.numeric(round(runif(48,20,40),1))

df <- data.frame(year,month,station,concentration)


id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")

participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")

So I have two dataset as below 所以我有两个数据集如下

df
   year month station concentration
1  1996   JAN       A          24.4
2  1996   FEB       A          37.0
3  1996   MAR       A          39.5
4  1996   APR       A          28.0
...
45 1997   SEP       B          37.7
46 1997   OCT       B          35.2
47 1997   NOV       B          26.8
48 1997   DEC       B          40.0

participant
  id station1996 station1997      start        end
1  1           A           B 1996-06-01 1997-04-01
2  2           A           A 1996-07-01 1997-04-01
3  3           B           A 1996-07-01 1997-04-01
4  4           B           B 1996-08-01 1997-05-01

For each id, I would like to calculate the average concentration between the start and end date (month year). 对于每个ID，我想计算开始日期和结束日期（月份）之间的平均浓度。 Noted that the station might change between years. 注意，该站可能会在几年之间变化。

For example for id=1, I would like to calculate the average concentration between JUN 1996 AND APR 1997. This should be based on the concentration from JUN 1996 to DEC 1996 at station A, and JAN 1997 to APR 1997 at station B. 例如，对于id = 1，我想计算1996年6月到1997年4月之间的平均浓度。这应该基于A站从1996年6月到1996年12月以及B站从1997年1月到1997年4月的浓度。

Can anyone help? 有人可以帮忙吗？

Thank you very much. 非常感谢你。

Answer 1

Here's a data.table solution. 这是一个data.table解决方案。 The basic idea is to enumerate all the dates in the start-end range as yearmon , for each id , and then use that as an index into the concentration table df . 基本思想是将每个id的开始-结束范围内的所有日期枚举为yearmon ，然后将其用作浓度表df的索引。 It's a bit convoluted so hopefully someone will come along and show you a simpler way. 这有点令人费解，所以希望有人会来给您展示一种更简单的方法。

library(data.table)
library(zoo)          # for as.yearmon(...)
setDT(df)             # convert to data.table
setDT(participant)
df[, yrmon:= as.yearmon(paste(year,month,sep="-"), format="%Y-%B")]   # add year-month column
p.melt <- reshape(participant, varying=2:3, direction="long", sep="", timevar="year")
x <- participant[, .(date=seq(start,end,by="month")), by=id]
x[, c("year","yrmon"):=.(year(date),as.yearmon(date))]           # add year and year-month
x[p.melt, station:=station, on=c("id","year")]                   # add station
x[df, conc:= concentration, on=c("yrmon","station"), nomatch=0]  # add concentration
setorder(x,id)    # not necessary, but makes it easier to interpret x
result <- x[, .(mean.conc=mean(conc)), by=id]                    # mean(conc) by id
result
#    id mean.conc
# 1:  1  28.61818
# 2:  2  28.56000
# 3:  3  28.44000
# 4:  4  29.60000

So, first we convert everything to data.tables. 因此，首先我们将所有内容都转换为data.tables。 Then we add a yrmon column to df for indexing later. 然后，我们将yrmon列添加到df以便以后进行索引。 Then we create p.melt by reshaping participant to long format, so that the station is in one column and the indicator (1996 or 1997) is in a separate column. 然后，我们通过将participant重塑为长格式来创建p.melt ，以便站点位于一列中，而指标（1996或1997）位于单独的列中。 Then we create a temporary table, x with the sequence of dates for each id , and add the year and yrmon for each of those dates. 然后，我们创建一个临时表x ，其中每个id带有日期序列，并为每个日期添加year和yrmon。 Then we merge that with p.melt on id and year to add a station column to x . 然后，将其与id和year p.melt合并，以将桩号列添加到x 。 Then we use yrmon and station to merge x with df to get the appropriate concentration. 然后，使用yrmon和station将x与df合并以获得适当的浓度。 Then we simply aggregate conc by id in x using mean(...) . 然后我们简单地使用mean(...)在x通过id聚合conc 。

如何计算两个日期之间的变量平均值

问题描述

1 个解决方案

解决方案1
1 2015-11-06 08:01:11

如何计算两个日期之间的变量平均值

问题描述

1 个解决方案

解决方案1 1 2015-11-06 08:01:11

解决方案1
1 2015-11-06 08:01:11