繁体   English   中英

如何计算两个数据框中的日期范围内的平均值

[英]How to calculate mean over date range in two data frames

我正在使用 R,我有两个数据框,一个包含开始日期、结束日期和站点代码的列,而另一个包含盐度和站点代码的每日测量值。 我想使用第二个数据框中的每日盐度度量来计算第一个数据框中开始日期和结束日期之间的平均盐度。

这是第一个数据框:

> head(events)
# A tibble: 6 x 5
  event_no duration date_start date_end   StationCode
     <dbl>    <dbl> <date>     <date>     <chr>      
1        1        4 2003-01-01 2003-01-04 niwtawq    
2        2        5 2003-01-06 2003-01-10 niwtawq    
3        3        7 2004-05-25 2004-05-31 niwtawq    
4        4        6 2004-10-31 2004-11-05 niwtawq    
5        5        7 2006-08-02 2006-08-08 niwtawq    
6        6        5 2007-08-07 2007-08-11 niwtawq 

这是第二个:

> head(dat4)
   StationCode DateFormatted  Sal
1:     niwtawq    2003-01-01 1.58
2:     niwtawq    2003-01-02 1.19
3:     niwtawq    2003-01-03 1.31
4:     niwtawq    2003-01-04 1.56
5:     niwtawq    2003-01-05 2.10
6:     niwtawq    2003-01-06 1.33
7:     niwtawq    2003-01-07 1.68
8:     niwtawq    2003-01-08 1.83
9:     niwtawq    2003-01-09 1.77
10:     niwtawq    2003-01-10 1.56

假设日期格式正确,那么您可以使用开始日期和结束日期创建一个日期序列,并通过匹配这些日期来索引第二个数据框的盐度值。

# Create ranges and name by station
ranges <- mapply(function(x, y, z)  seq.Date(y, z, 1), df1$StationCode,  df1$date_start, df1$date_end, USE.NAMES = TRUE)

# Match by date and station
df1$meansalinity <- mapply(function(a, b)
  mean(df2$Sal[df2$StationCode == b][match(a, df2$DateFormatted[df2$StationCode == b])]), ranges, names(ranges))

df1

  event_no duration date_start   date_end StationCode meansalinity
1        1        4 2003-01-01 2003-01-04     niwtawq        1.410
2        2        5 2003-01-06 2003-01-10     niwtawq        1.634

使用tidyverse一种方法是在date_startdate_end之间创建一个序列,使用"StationCode"将其与dat4连接, filter范围内(即开始和结束日期之间)的行, group_by eventdate_startdate_endStationCode进行计算的mean

library(tidyverse)

events %>%
  mutate(date = map2(date_start, date_end, seq, by = "day")) %>%
  unnest(date) %>%
  left_join(dat4, by = 'StationCode') %>%
  filter(DateFormatted >= date_start & DateFormatted <= date_end) %>%
  group_by(event_no, date_start, date_end, StationCode) %>%
  summarise(Sal = mean(Sal))

# event_no date_start date_end   StationCode   Sal
#     <int> <date>     <date>     <fct>       <dbl>
#1        1 2003-01-01 2003-01-04 niwtawq      1.41
#2        2 2003-01-06 2003-01-10 niwtawq      1.63

数据

events <- structure(list(event_no = 1:6, duration = c(4L, 5L, 7L, 6L, 7L, 
5L), date_start = structure(c(12053, 12058, 12563, 12722, 13362, 
13732), class = "Date"), date_end = structure(c(12056, 12062, 
12569, 12727, 13368, 13736), class = "Date"), StationCode = 
structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor")), 
row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

dat4 <- structure(list(StationCode = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor"),
DateFormatted = structure(c(12053, 12054, 12055, 12056, 12057, 12058, 12059, 12060, 
12061, 12062), class = "Date"), Sal = c(1.58, 1.19, 1.31, 1.56, 2.1, 1.33, 
1.68, 1.83, 1.77, 1.56)), row.names = c("1:", "2:", "3:", "4:", 
"5:", "6:", "7:", "8:", "9:", "10:"), class = "data.frame")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM