简体   繁体   中英

Summarize R data frame based on a date range in a second data frame

I have two data frames, one that includes data by day, and one that includes data by irregular time multi-day intervals. For example:

A data frame precip_range with precipitation data by irregular time intervals:

start_date<-as.Date(c("2010-11-01", "2010-11-04", "2010-11-10"))
end_date<-as.Date(c("2010-11-03", "2010-11-09", "2010-11-12"))
precipitation<-(c(12, 8, 14))
precip_range<-data.frame(start_date, end_date, precipitation)

And a data frame precip_daily with daily precipitation data:

day<-as.Date(c("2010-11-01", "2010-11-02", "2010-11-03", "2010-11-04", "2010-11-05",
                  "2010-11-06", "2010-11-07", "2010-11-08", "2010-11-09", "2010-11-10",
                  "2010-11-11", "2010-11-12"))
precip<-(c(3, 1, 2, 1, 0.25, 1, 3, 0.33, 0.75, 0.5, 1, 2))
precip_daily<-data.frame(day, precip)

In this example, precip_daily represents daily precipitation estimated by a model and precip_range represents measured cumulative precipitation for specific date ranges. I am trying to compare modeled to measured data, which requires synchronizing the time periods.

So, I want to summarize the precip column in data frame precip_daily (count of observations and sum of precip ) by the date date ranges between start_date and end_date in the data frame precip_range . Any thoughts on the best way to do this?

You can use the start_dates from precip_range as breaks to cut() to group your daily values. For example

rng <- cut(precip_daily$day, 
    breaks=c(precip_range$start_date, max(precip_range$end_date)), 
    include.lowest=T)

Here we cut the values in daily using the start dates in the range data.frame. We're sure to include the lowest value and stop at the largest end value. If we merge that with the daily values we see

cbind(precip_daily, rng)

#           day precip        rng
# 1  2010-11-01   3.00 2010-11-01
# 2  2010-11-02   1.00 2010-11-01
# 3  2010-11-03   2.00 2010-11-01
# 4  2010-11-04   1.00 2010-11-04
# 5  2010-11-05   0.25 2010-11-04
# 6  2010-11-06   1.00 2010-11-04
# 7  2010-11-07   3.00 2010-11-04
# 8  2010-11-08   0.33 2010-11-04
# 9  2010-11-09   0.75 2010-11-04
# 10 2010-11-10   0.50 2010-11-10
# 11 2010-11-11   1.00 2010-11-10
# 12 2010-11-12   2.00 2010-11-10

which shows that the values have been grouped. Then we can do

aggregate(cbind(count=1, sum=precip_daily$precip)~rng, FUN=sum)

#          rng count  sum
# 1 2010-11-01     3 6.00
# 2 2010-11-04     6 6.33
# 3 2010-11-10     3 3.50

To get the total for each of those ranges (ranges as labeled with the start date)

Or

library(zoo)
library(data.table)
temp <- merge(precip_daily, precip_range, by.x = "day", by.y = "start_date", all.x = T)
temp$end_date <- na.locf(temp$end_date)
setDT(temp)[, list(Sum = sum(precip), Count = .N), by = end_date]

##     end_date  Sum Count
## 1: 2010-11-03 6.00     3
## 2: 2010-11-09 6.33     6
## 3: 2010-11-12 3.50     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM