[英]Work out rolling sums for variables with non-consecutive days in a dataframe in R
我有一些數據,我想在大約 7 年的結果中計算出 14 天滾動窗口中的獲勝百分比。 這些日子是不連續的,所以每當我按“Trainer”變量分組並運行rollapplyr
或runSum
/ sum_run
時,我都會總結過去 14 個事件,但不知道如何將 14 天分組。 當我嘗試使用從日期開始的日期定義寬度或 k 值時,出現錯誤
指定的時間序列參數無效
或vec' must be sorted non-decreasingly and not contain NAs
編輯-下面的代碼給出了上面的錯誤
df %>% group_by(Trainer) %>% mutate(Fourteen_day_wins = rollapplyr(Wins, width = 1:n() - findInterval( Date %d-% Days(14), Date), sum)) %>% ungroup
當我的 df 的新列中按 Trainer 分組時,我想獲得 14 天滾動期間的總獲勝次數和事件計數。 有人可以指出我正確的方向嗎? 到目前為止,仍然是一個 R 新手,所以讓我很難過!
樣本df:
structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000, 1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600, 1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800, 1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400, 1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600, 1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400, 1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400, 1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000, 1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200, 1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200, 1525219200, 1525219200), tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl", "data.frame"))
您可以使用complete
來完成您的數據,然后使用 14 個窗口期
df %>%
group_by(Trainer) %>%
complete(Date = seq(min(Date), max(Date), '1 day')) %>%
mutate(runMeans = zoo::rollmean(Wins, 14,0,na.rm = TRUE))
# A tibble: 459 x 4
# Groups: Trainer [2]
Trainer Date Wins runMeans
<chr> <dttm> <dbl> <dbl>
1 Appleby, Charlie 2017-10-15 00:00:00 1 0
2 Appleby, Charlie 2017-10-16 00:00:00 NA 0
3 Appleby, Charlie 2017-10-17 00:00:00 NA 0
4 Appleby, Charlie 2017-10-18 00:00:00 1 0
5 Appleby, Charlie 2017-10-18 00:00:00 0 0
6 Appleby, Charlie 2017-10-19 00:00:00 NA 0
7 Appleby, Charlie 2017-10-20 00:00:00 NA 0.429
8 Appleby, Charlie 2017-10-21 00:00:00 NA 0.429
9 Appleby, Charlie 2017-10-22 00:00:00 NA 0.429
10 Appleby, Charlie 2017-10-23 00:00:00 0 0.375
一種選擇是組合所有天和所有培訓師,將其與原始數據合並,然后使用 14 天窗口:
library(zoo)
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
library(tidyverse)
df <- structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000, 1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600, 1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800, 1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400, 1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600, 1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400, 1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400, 1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000, 1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200, 1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200, 1525219200, 1525219200), tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl", "data.frame"))
all_dates <- with(df, expand_grid(Trainer = unique(Trainer),
Date = seq(min(Date), max(Date), by="1 day")))
all_dates <- left_join(all_dates, df)
#> Joining, by = c("Trainer", "Date")
all_dates %>%
group_by(Trainer) %>%
mutate(win_pct = rollapplyr(Wins,
width=14,
mean,
partial = TRUE,
align="right",
na.rm=TRUE,
fill=TRUE))
#> # A tibble: 460 × 4
#> # Groups: Trainer [2]
#> Trainer Date Wins win_pct
#> <chr> <dttm> <dbl> <dbl>
#> 1 Appleby, Charlie 2017-10-15 00:00:00 1 1
#> 2 Appleby, Charlie 2017-10-16 00:00:00 NA 1
#> 3 Appleby, Charlie 2017-10-17 00:00:00 NA 1
#> 4 Appleby, Charlie 2017-10-18 00:00:00 1 1
#> 5 Appleby, Charlie 2017-10-18 00:00:00 0 0.667
#> 6 Appleby, Charlie 2017-10-19 00:00:00 NA 0.667
#> 7 Appleby, Charlie 2017-10-20 00:00:00 NA 0.667
#> 8 Appleby, Charlie 2017-10-21 00:00:00 NA 0.667
#> 9 Appleby, Charlie 2017-10-22 00:00:00 NA 0.667
#> 10 Appleby, Charlie 2017-10-23 00:00:00 0 0.5
#> # … with 450 more rows
由reprex 包於 2022-05-31 創建 (v2.0.1)
問題是 findInterval 的參數應該是數字和有序的。
為了解決這個問題,將日期轉換為 Date 類,然后轉換為數字,以便下面的 d 是自紀元以來的天數。 現在我們可以將它與 findInterval 一起使用,如圖所示。 如果數據已經排序,則可以省略排列行。
library(dplyr, exclude = c("filter", "lag"))
library(zoo)
DF %>%
arrange(Trainer, Date) %>%
group_by(Trainer) %>%
mutate(d = as.numeric(as.Date(Date)),
Wins14 = rollapplyr(Wins, 1:n() - findInterval(d - 14, d), sum)) %>%
ungroup %>%
select(-d)
給予:
# A tibble: 101 x 4
Trainer Wins Date Wins14
<chr> <dbl> <dttm> <dbl>
1 Appleby, Charlie 1 2017-10-15 00:00:00 1
2 Appleby, Charlie 1 2017-10-18 00:00:00 2
3 Appleby, Charlie 0 2017-10-18 00:00:00 2
4 Appleby, Charlie 0 2017-10-23 00:00:00 2
5 Appleby, Charlie 1 2017-10-25 00:00:00 3
6 Appleby, Charlie 0 2017-10-25 00:00:00 3
7 Appleby, Charlie 0 2017-10-25 00:00:00 3
8 Appleby, Charlie 1 2017-10-25 00:00:00 4
9 Appleby, Charlie 0 2017-10-27 00:00:00 4
10 Appleby, Charlie 0 2017-10-27 00:00:00 4
# ... with 91 more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.