[英]How to calculate mean over date range in two data frames
我正在使用 R,我有两个数据框,一个包含开始日期、结束日期和站点代码的列,而另一个包含盐度和站点代码的每日测量值。 我想使用第二个数据框中的每日盐度度量来计算第一个数据框中开始日期和结束日期之间的平均盐度。
这是第一个数据框:
> head(events)
# A tibble: 6 x 5
event_no duration date_start date_end StationCode
<dbl> <dbl> <date> <date> <chr>
1 1 4 2003-01-01 2003-01-04 niwtawq
2 2 5 2003-01-06 2003-01-10 niwtawq
3 3 7 2004-05-25 2004-05-31 niwtawq
4 4 6 2004-10-31 2004-11-05 niwtawq
5 5 7 2006-08-02 2006-08-08 niwtawq
6 6 5 2007-08-07 2007-08-11 niwtawq
这是第二个:
> head(dat4)
StationCode DateFormatted Sal
1: niwtawq 2003-01-01 1.58
2: niwtawq 2003-01-02 1.19
3: niwtawq 2003-01-03 1.31
4: niwtawq 2003-01-04 1.56
5: niwtawq 2003-01-05 2.10
6: niwtawq 2003-01-06 1.33
7: niwtawq 2003-01-07 1.68
8: niwtawq 2003-01-08 1.83
9: niwtawq 2003-01-09 1.77
10: niwtawq 2003-01-10 1.56
假设日期格式正确,那么您可以使用开始日期和结束日期创建一个日期序列,并通过匹配这些日期来索引第二个数据框的盐度值。
# Create ranges and name by station
ranges <- mapply(function(x, y, z) seq.Date(y, z, 1), df1$StationCode, df1$date_start, df1$date_end, USE.NAMES = TRUE)
# Match by date and station
df1$meansalinity <- mapply(function(a, b)
mean(df2$Sal[df2$StationCode == b][match(a, df2$DateFormatted[df2$StationCode == b])]), ranges, names(ranges))
df1
event_no duration date_start date_end StationCode meansalinity
1 1 4 2003-01-01 2003-01-04 niwtawq 1.410
2 2 5 2003-01-06 2003-01-10 niwtawq 1.634
使用tidyverse
一种方法是在date_start
和date_end
之间创建一个序列,使用"StationCode"
将其与dat4
连接, filter
范围内(即开始和结束日期之间)的行, group_by
event
, date_start
, date_end
和StationCode
进行计算的mean
。
library(tidyverse)
events %>%
mutate(date = map2(date_start, date_end, seq, by = "day")) %>%
unnest(date) %>%
left_join(dat4, by = 'StationCode') %>%
filter(DateFormatted >= date_start & DateFormatted <= date_end) %>%
group_by(event_no, date_start, date_end, StationCode) %>%
summarise(Sal = mean(Sal))
# event_no date_start date_end StationCode Sal
# <int> <date> <date> <fct> <dbl>
#1 1 2003-01-01 2003-01-04 niwtawq 1.41
#2 2 2003-01-06 2003-01-10 niwtawq 1.63
数据
events <- structure(list(event_no = 1:6, duration = c(4L, 5L, 7L, 6L, 7L,
5L), date_start = structure(c(12053, 12058, 12563, 12722, 13362,
13732), class = "Date"), date_end = structure(c(12056, 12062,
12569, 12727, 13368, 13736), class = "Date"), StationCode =
structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor")),
row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
dat4 <- structure(list(StationCode = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "niwtawq", class = "factor"),
DateFormatted = structure(c(12053, 12054, 12055, 12056, 12057, 12058, 12059, 12060,
12061, 12062), class = "Date"), Sal = c(1.58, 1.19, 1.31, 1.56, 2.1, 1.33,
1.68, 1.83, 1.77, 1.56)), row.names = c("1:", "2:", "3:", "4:",
"5:", "6:", "7:", "8:", "9:", "10:"), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.