简体   繁体   中英

how to calculate the mean of a variable between two date

I would like to calculate the mean of a variable between two date, below is the reproducible data frame.

year <- c(1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,1996,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,
      1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997,1997)
month <- c("JAN","FEB","MAR","APR","MAY","JUN","JUL","AUG","SEP","OCT","NOV","DEC")
station <- c("A","A","A","A","A","A","A","A","A","A","A","A",
         "B","B","B","B","B","B","B","B","B","B","B","B")

concentration <- as.numeric(round(runif(48,20,40),1))

df <- data.frame(year,month,station,concentration)


id <- c(1,2,3,4)
station1996 <- c("A","A","B","B")
station1997 <- c("B","A","A","B")
start <- c("06/01/1996","07/01/1996","07/01/1996","08/01/1996")
end <- c("04/01/1997","04/01/1997","04/01/1997","05/01/1997")

participant <- data.frame(id,station1996,station1997,start,end)
participant$start <- as.Date(participant$start, format = "%m/%d/%Y")
participant$end <- as.Date(participant$end, format = "%m/%d/%Y")

So I have two dataset as below

df
   year month station concentration
1  1996   JAN       A          24.4
2  1996   FEB       A          37.0
3  1996   MAR       A          39.5
4  1996   APR       A          28.0
...
45 1997   SEP       B          37.7
46 1997   OCT       B          35.2
47 1997   NOV       B          26.8
48 1997   DEC       B          40.0

participant
  id station1996 station1997      start        end
1  1           A           B 1996-06-01 1997-04-01
2  2           A           A 1996-07-01 1997-04-01
3  3           B           A 1996-07-01 1997-04-01
4  4           B           B 1996-08-01 1997-05-01

For each id, I would like to calculate the average concentration between the start and end date (month year). Noted that the station might change between years.

For example for id=1, I would like to calculate the average concentration between JUN 1996 AND APR 1997. This should be based on the concentration from JUN 1996 to DEC 1996 at station A, and JAN 1997 to APR 1997 at station B.

Can anyone help?

Thank you very much.

Here's a data.table solution. The basic idea is to enumerate all the dates in the start-end range as yearmon , for each id , and then use that as an index into the concentration table df . It's a bit convoluted so hopefully someone will come along and show you a simpler way.

library(data.table)
library(zoo)          # for as.yearmon(...)
setDT(df)             # convert to data.table
setDT(participant)
df[, yrmon:= as.yearmon(paste(year,month,sep="-"), format="%Y-%B")]   # add year-month column
p.melt <- reshape(participant, varying=2:3, direction="long", sep="", timevar="year")
x <- participant[, .(date=seq(start,end,by="month")), by=id]
x[, c("year","yrmon"):=.(year(date),as.yearmon(date))]           # add year and year-month
x[p.melt, station:=station, on=c("id","year")]                   # add station
x[df, conc:= concentration, on=c("yrmon","station"), nomatch=0]  # add concentration
setorder(x,id)    # not necessary, but makes it easier to interpret x
result <- x[, .(mean.conc=mean(conc)), by=id]                    # mean(conc) by id
result
#    id mean.conc
# 1:  1  28.61818
# 2:  2  28.56000
# 3:  3  28.44000
# 4:  4  29.60000

So, first we convert everything to data.tables. Then we add a yrmon column to df for indexing later. Then we create p.melt by reshaping participant to long format, so that the station is in one column and the indicator (1996 or 1997) is in a separate column. Then we create a temporary table, x with the sequence of dates for each id , and add the year and yrmon for each of those dates. Then we merge that with p.melt on id and year to add a station column to x . Then we use yrmon and station to merge x with df to get the appropriate concentration. Then we simply aggregate conc by id in x using mean(...) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM