[英]How to calculate mean over date range in two data frames

我正在使用 R,我有两个数据框,一个包含开始日期、结束日期和站点代码的列,而另一个包含盐度和站点代码的每日测量值。 我想使用第二个数据框中的每日盐度度量来计算第一个数据框中开始日期和结束日期之间的平均盐度。


> head(events)
# A tibble: 6 x 5
  event_no duration date_start date_end   StationCode
     <dbl>    <dbl> <date>     <date>     <chr>      
1        1        4 2003-01-01 2003-01-04 niwtawq    
2        2        5 2003-01-06 2003-01-10 niwtawq    
3        3        7 2004-05-25 2004-05-31 niwtawq    
4        4        6 2004-10-31 2004-11-05 niwtawq    
5        5        7 2006-08-02 2006-08-08 niwtawq    
6        6        5 2007-08-07 2007-08-11 niwtawq 


> head(dat4)
   StationCode DateFormatted  Sal
1:     niwtawq    2003-01-01 1.58
2:     niwtawq    2003-01-02 1.19
3:     niwtawq    2003-01-03 1.31
4:     niwtawq    2003-01-04 1.56
5:     niwtawq    2003-01-05 2.10
6:     niwtawq    2003-01-06 1.33
7:     niwtawq    2003-01-07 1.68
8:     niwtawq    2003-01-08 1.83
9:     niwtawq    2003-01-09 1.77
10:     niwtawq    2003-01-10 1.56


# Create ranges and name by station
ranges <- mapply(function(x, y, z)  seq.Date(y, z, 1), df1$StationCode,  df1$date_start, df1$date_end, USE.NAMES = TRUE)

# Match by date and station
df1$meansalinity <- mapply(function(a, b)
  mean(df2$Sal[df2$StationCode == b][match(a, df2$DateFormatted[df2$StationCode == b])]), ranges, names(ranges))


  event_no duration date_start   date_end StationCode meansalinity
1        1        4 2003-01-01 2003-01-04     niwtawq        1.410
2        2        5 2003-01-06 2003-01-10     niwtawq        1.634

使用tidyverse一种方法是在date_startdate_end之间创建一个序列,使用"StationCode"将其与dat4连接, filter范围内(即开始和结束日期之间)的行, group_by eventdate_startdate_endStationCode进行计算的mean


events %>%
  mutate(date = map2(date_start, date_end, seq, by = "day")) %>%
  unnest(date) %>%
  left_join(dat4, by = 'StationCode') %>%
  filter(DateFormatted >= date_start & DateFormatted <= date_end) %>%
  group_by(event_no, date_start, date_end, StationCode) %>%
  summarise(Sal = mean(Sal))

# event_no date_start date_end   StationCode   Sal
#     <int> <date>     <date>     <fct>       <dbl>
#1        1 2003-01-01 2003-01-04 niwtawq      1.41
#2        2 2003-01-06 2003-01-10 niwtawq      1.63


events <- structure(list(event_no = 1:6, duration = c(4L, 5L, 7L, 6L, 7L, 
5L), date_start = structure(c(12053, 12058, 12563, 12722, 13362, 
13732), class = "Date"), date_end = structure(c(12056, 12062, 
12569, 12727, 13368, 13736), class = "Date"), StationCode = 
structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor")), 
row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

dat4 <- structure(list(StationCode = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor"),
DateFormatted = structure(c(12053, 12054, 12055, 12056, 12057, 12058, 12059, 12060, 
12061, 12062), class = "Date"), Sal = c(1.58, 1.19, 1.31, 1.56, 2.1, 1.33, 
1.68, 1.83, 1.77, 1.56)), row.names = c("1:", "2:", "3:", "4:", 
"5:", "6:", "7:", "8:", "9:", "10:"), class = "data.frame")


