[英]Calculating average of a column based on multiple time periods
我需要幫助來弄清楚如何每 ___ 小時計算一個變量的平均值。 我想每 1/2 小時計算一次平均值,然后每 1、2、4 和 6 小時計算一次。
這是我的數據集:
dput(head(R3L12, 10))
structure(list(Date = c("2015-05-23", "2015-05-23", "2015-05-23",
"2015-05-23", "2015-05-23", "2015-05-23", "2015-05-23", "2015-05-23",
"2015-05-23", "2015-05-23"), Time = c("07:25:00", "07:40:00",
"07:45:00", "09:10:00", "11:45:00", "11:55:00", "12:05:00", "12:35:00",
"12:45:00", "13:30:00"), Turtle = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("R3L1", "R3L11", "R3L12",
"R3L2", "R3L4", "R3L8", "R3L9", "R4L8", "R8L1", "R8L4", "R8NAT123"
), class = "factor"), Tex = c(11.891, 12.008, 12.055, 13.219,
18.727, 18.992, 19.477, 20.367, 20.641, 28.305), m.Tb = c(12.477,
12.54, 12.54, 12.978, 16.362, 16.612, 17.238, 19.617, 19.993,
24.371), m.HR = c(7.56457, 6.66759, 17.51107, 9.72277, 19.44553,
13.07674, 28.115, 14.99467, 17.16947, 40.40479), season = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("beginning",
"end", "middle"), class = "factor"), year = c(2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Mass = c(360L,
360L, 360L, 360L, 360L, 360L, 360L, 360L, 360L, 360L)), row.names = c(NA,
10L), class = "data.frame")
我希望能夠計算每個日期每個時間段的平均 m.Tb。 例如,對於 2015-05-23,我想要每 30 分鍾、1 小時、2 小時、4 小時和 6 小時的平均 m.Tb。 然后我想在第二天重復這一點。 有時 Time 列中存在“缺失”行,這是因為 NA 行已被取出。
如果您需要澄清或有疑問,請告訴我,因為我還是 r 的新手。
我們可以使用來自ceiling_date
的lubridate
library(lubridate)
library(dplyr)
library(stringr)
R3L12 %>%
group_by(DS = ceiling_date(as.POSIXct(str_c(Date, Time, sep=" ")),
unit = '30 min' )) %>%
summarise(avg_30 = mean(m.Tb)) %>%
mutate(date = as.Date(DS))
-輸出
# A tibble: 7 x 3
# DS avg_30 date
# <dttm> <dbl> <date>
#1 2015-05-23 07:30:00 12.5 2015-05-23
#2 2015-05-23 08:00:00 12.5 2015-05-23
#3 2015-05-23 09:30:00 13.0 2015-05-23
#4 2015-05-23 12:00:00 16.5 2015-05-23
#5 2015-05-23 12:30:00 17.2 2015-05-23
#6 2015-05-23 13:00:00 19.8 2015-05-23
#7 2015-05-23 13:30:00 24.4 2015-05-23
我希望這就是你要找的。 由於生成的數據幀具有不同的行號,我不得不將它們存儲在一個列表中。 為此,我首先創建了一個您想要計算平均值的所有時間跨度的字符向量,然后我使用map
cut
從purrr
breaks
中創建您想要的時間跨度來替換它們。
library(dplyr)
library(lubridate)
library(purrr)
breaks <- c("15 min", "30 min", "1 hour", "2 hour", "4 hour", "6 hour")
breaks %>%
map(~ df %>%
unite("Date-Time", c("Date", "Time"), sep = " ", remove = FALSE) %>%
mutate(`Date-Time` = ymd_hms(`Date-Time`)) %>%
mutate(DS = cut(`Date-Time`, breaks = .x)) %>%
group_by(ymd(Date), DS) %>%
summarise(avg = mean(m.Tb))) %>%
set_names(breaks)
$`15 min`
# A tibble: 8 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:25:00 12.5
2 2015-05-23 2015-05-23 07:40:00 12.5
3 2015-05-23 2015-05-23 09:10:00 13.0
4 2015-05-23 2015-05-23 11:40:00 16.4
5 2015-05-23 2015-05-23 11:55:00 16.9
6 2015-05-23 2015-05-23 12:25:00 19.6
7 2015-05-23 2015-05-23 12:40:00 20.0
8 2015-05-23 2015-05-23 13:25:00 24.4
$`30 min`
# A tibble: 6 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:25:00 12.5
2 2015-05-23 2015-05-23 08:55:00 13.0
3 2015-05-23 2015-05-23 11:25:00 16.4
4 2015-05-23 2015-05-23 11:55:00 16.9
5 2015-05-23 2015-05-23 12:25:00 19.8
6 2015-05-23 2015-05-23 13:25:00 24.4
$`1 hour`
# A tibble: 5 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:00:00 12.5
2 2015-05-23 2015-05-23 09:00:00 13.0
3 2015-05-23 2015-05-23 11:00:00 16.5
4 2015-05-23 2015-05-23 12:00:00 18.9
5 2015-05-23 2015-05-23 13:00:00 24.4
$`2 hour`
# A tibble: 4 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:00:00 12.5
2 2015-05-23 2015-05-23 09:00:00 13.0
3 2015-05-23 2015-05-23 11:00:00 18.0
4 2015-05-23 2015-05-23 13:00:00 24.4
$`4 hour`
# A tibble: 2 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:00:00 12.6
2 2015-05-23 2015-05-23 11:00:00 19.0
$`6 hour`
# A tibble: 2 x 3
# Groups: ymd(Date) [1]
`ymd(Date)` DS avg
<date> <fct> <dbl>
1 2015-05-23 2015-05-23 07:00:00 15.6
2 2015-05-23 2015-05-23 13:00:00 24.4
我就是這樣做的,你有很多缺失的時期,所以它不是半小時聚合的最佳輸出
data_example <- structure(list(Date = c("2015-05-23", "2015-05-23", "2015-05-23",
"2015-05-23", "2015-05-23", "2015-05-23", "2015-05-23", "2015-05-23",
"2015-05-23", "2015-05-23"), Time = c("07:25:00", "07:40:00",
"07:45:00", "09:10:00", "11:45:00", "11:55:00", "12:05:00", "12:35:00",
"12:45:00", "13:30:00"), Turtle = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("R3L1", "R3L11", "R3L12",
"R3L2", "R3L4", "R3L8", "R3L9", "R4L8", "R8L1", "R8L4", "R8NAT123"
), class = "factor"), Tex = c(11.891, 12.008, 12.055, 13.219,
18.727, 18.992, 19.477, 20.367, 20.641, 28.305), m.Tb = c(12.477,
12.54, 12.54, 12.978, 16.362, 16.612, 17.238, 19.617, 19.993,
24.371), m.HR = c(7.56457, 6.66759, 17.51107, 9.72277, 19.44553,
13.07674, 28.115, 14.99467, 17.16947, 40.40479), season = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("beginning",
"end", "middle"), class = "factor"), year = c(2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L), Mass = c(360L,
360L, 360L, 360L, 360L, 360L, 360L, 360L, 360L, 360L)), row.names = c(NA,
10L), class = "data.frame")
library(tidyverse)
floor_30 <- function(x) clock::date_floor(x = x,precision = "minute",n = 30)
mean_at_inteval <- function(data,date_col,interval_func) {
data |>
group_by(interval = {{date_col}} |> interval_func()) |>
summarise(sum_interval = sum(m.Tb)) |>
summarise(mean_interval = mean(sum_interval))
}
nest_example_data <- data_example %>%
mutate(date_timer = str_c(Date,Time) %>% clock::date_time_parse(zone = "UTC")) |>
nest_by(Date)
final_data <- nest_example_data |> mutate(floor_30 = data |> mean_at_inteval(date_col = date_timer,interval_func = floor_30))
final_data
#> # A tibble: 1 x 3
#> # Rowwise: Date
#> Date data floor_30$mean_interval
#> <chr> <list<tibble[,9]>> <dbl>
#> 1 2015-05-23 [10 x 9] 23.5
由代表 package (v2.0.0) 於 2021 年 5 月 30 日創建
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.