[英]R: Summarize groups of rows in one data.frame with groups based on whether a date column's value falls in a time range in another data.frame
我试图在数据帧 start_end 中每一行给出的时间帧之间找到数据帧df_count (汽车、公共汽车、卡车)中每一列的行总和
例如, start_end的第 1 行范围从 2021-06-12 00:15:00 到 2021-06-12 00:55:00。
我想在df_count的第 1 列(第 5 到 12 行)中找到这些时间戳之间的汽车行总和(例如)
df_count <- structure(list(date = structure(c(1623456000, 1623456300, 1623456600,
1623456900, 1623457200, 1623457500, 1623457800, 1623458100, 1623458400,
1623458700, 1623459000, 1623459300, 1623459600, 1623459900, 1623460200,
1623460500, 1623460800, 1623461100, 1623461400, 1623461700, 1623462000,
1623462300, 1623462600, 1623462900, 1623463200, 1623463500, 1623463800,
1623464100, 1623464400, 1623464700), tzone = "UTC", class = c("POSIXct",
"POSIXt")), cars = c(45, 45, 45, 52, 52, 52, 46, 46, 46, 34,
34, 34, 29, 29, 29, 36, 36, 36, 17, 17, 17, 18, 18, 18, 14, 14,
14, 3, 3, 3), buses = c(4, 4, 4, 7, 7, 7, 5, 5, 5, 4, 4, 4, 5,
5, 5, 4, 4, 4, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1), trucks = c(3,
3, 3, 2, 2, 2, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2,
2, 2, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -30L), class = c("tbl_df",
"tbl", "data.frame"))
start_end <- structure(list(start_co2_plume = c("2021-06-12 00:15:00", "2021-06-12 00:55:00",
"2021-06-12 01:15:00", "2021-06-12 01:30:00", "2021-06-12 02:00:00",
"2021-06-12 02:25:00", "2021-06-12 03:00:00", "2021-06-12 03:20:00",
"2021-06-12 03:45:00", "2021-06-12 03:55:00", "2021-06-12 04:20:00",
"2021-06-12 04:35:00", "2021-06-12 04:50:00", "2021-06-12 05:10:00",
"2021-06-12 05:40:00", "2021-06-12 05:50:00", "2021-06-12 06:00:00",
"2021-06-12 06:10:00", "2021-06-12 06:25:00", "2021-06-12 06:35:00",
"2021-06-12 06:45:00", "2021-06-12 06:55:00", "2021-06-12 08:10:00",
"2021-06-12 08:30:00", "2021-06-12 08:55:00", "2021-06-12 09:45:00",
"2021-06-12 10:05:00", "2021-06-12 10:35:00", "2021-06-12 11:05:00",
"2021-06-12 11:25:00"), end_co2_plume = c("2021-06-12 00:55:00",
"2021-06-12 01:15:00", "2021-06-12 01:30:00", "2021-06-12 02:00:00",
"2021-06-12 02:25:00", "2021-06-12 03:00:00", "2021-06-12 03:20:00",
"2021-06-12 03:35:00", "2021-06-12 03:55:00", "2021-06-12 04:10:00",
"2021-06-12 04:35:00", "2021-06-12 04:50:00", "2021-06-12 05:10:00",
"2021-06-12 05:30:00", "2021-06-12 05:50:00", "2021-06-12 06:00:00",
"2021-06-12 06:10:00", "2021-06-12 06:25:00", "2021-06-12 06:35:00",
"2021-06-12 06:45:00", "2021-06-12 06:55:00", "2021-06-12 07:10:00",
"2021-06-12 08:30:00", "2021-06-12 08:55:00", "2021-06-12 09:10:00",
"2021-06-12 10:05:00", "2021-06-12 10:25:00", "2021-06-12 10:50:00",
"2021-06-12 11:25:00", "2021-06-12 11:45:00")), row.names = c(NA,
30L), class = "data.frame")
下面产生所需的输出。 有必要假设日期的时区,所以我假设它们来自同一个时区。
library(purrr)
library(dplyr)
# Convert dates in start_end from character vectors to date classes
# Assumes the times are in the same time zone
start_end <- start_end %>% mutate(start_date = as.POSIXct(start_co2_plume, tz = "UTC"),
end_date = as.POSIXct(end_co2_plume, tz = "UTC"))
# For each row in start_end, subset df_count to the rows whose dates fall in
# the the interval defined by the start_date and end_date values for that row.
# For each automobile column, sum the values and add an index to tell us which
# interval it came from.
results <-
pmap(list(start_end$start_date, start_end$end_date, 1:nrow(start_end)),
function(start, end, ind) {
df_count %>%
filter((date >= start) & (date < end)) %>%
select(-date) %>%
summarise(across(everything(), sum)) %>%
mutate(interval_id = ind,
start = start,
end = end)
})
# Combine into a single data.frame
results %>% bind_rows()
#> # A tibble: 30 x 6
#> cars buses trucks interval_id start end
#> <dbl> <dbl> <dbl> <int> <dttm> <dttm>
#> 1 362 44 26 1 2021-06-12 00:15:00 2021-06-12 00:55:00
#> 2 121 19 13 2 2021-06-12 00:55:00 2021-06-12 01:15:00
#> 3 108 12 9 3 2021-06-12 01:15:00 2021-06-12 01:30:00
#> 4 105 12 15 4 2021-06-12 01:30:00 2021-06-12 02:00:00
#> 5 48 5 5 5 2021-06-12 02:00:00 2021-06-12 02:25:00
#> 6 3 1 1 6 2021-06-12 02:25:00 2021-06-12 03:00:00
#> 7 0 0 0 7 2021-06-12 03:00:00 2021-06-12 03:20:00
#> 8 0 0 0 8 2021-06-12 03:20:00 2021-06-12 03:35:00
#> 9 0 0 0 9 2021-06-12 03:45:00 2021-06-12 03:55:00
#> 10 0 0 0 10 2021-06-12 03:55:00 2021-06-12 04:10:00
#> # ... with 20 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.