在 R 的數據框中計算出不連續天數的變量的滾動總和

Question

我有一些數據，我想在大約 7 年的結果中計算出 14 天滾動窗口中的獲勝百分比。 這些日子是不連續的，所以每當我按“Trainer”變量分組並運行rollapplyr或runSum / sum_run時，我都會總結過去 14 個事件，但不知道如何將 14 天分組。 當我嘗試使用從日期開始的日期定義寬度或 k 值時，出現錯誤

指定的時間序列參數無效

或vec' must be sorted non-decreasingly and not contain NAs

編輯-下面的代碼給出了上面的錯誤

df %>% group_by(Trainer) %>% mutate(Fourteen_day_wins =             rollapplyr(Wins, width = 1:n() - findInterval( Date %d-% Days(14), Date), sum)) %>%  ungroup

當我的 df 的新列中按 Trainer 分組時，我想獲得 14 天滾動期間的總獲勝次數和事件計數。 有人可以指出我正確的方向嗎？ 到目前為止，仍然是一個 R 新手，所以讓我很難過！

樣本df：

structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J",  "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie",  "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J",  "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie",  "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0,  0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0,  0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,  1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,  1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000,  1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600,  1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800,  1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600,  1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400,  1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800,  1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600,  1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600,  1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600,  1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400,  1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600,  1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600,  1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000,  1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400,  1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000,  1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200,  1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200,  1525219200, 1525219200), tzone = "UTC", class = c("POSIXct",  "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl",  "data.frame"))

Answer 1

您可以使用complete來完成您的數據，然后使用 14 個窗口期

df %>%
       group_by(Trainer) %>%
       complete(Date = seq(min(Date), max(Date), '1 day')) %>%
       mutate(runMeans = zoo::rollmean(Wins, 14,0,na.rm = TRUE))
    # A tibble: 459 x 4
    # Groups:   Trainer [2]
       Trainer          Date                 Wins runMeans
       <chr>            <dttm>              <dbl>    <dbl>
     1 Appleby, Charlie 2017-10-15 00:00:00     1    0    
     2 Appleby, Charlie 2017-10-16 00:00:00    NA    0    
     3 Appleby, Charlie 2017-10-17 00:00:00    NA    0    
     4 Appleby, Charlie 2017-10-18 00:00:00     1    0    
     5 Appleby, Charlie 2017-10-18 00:00:00     0    0    
     6 Appleby, Charlie 2017-10-19 00:00:00    NA    0    
     7 Appleby, Charlie 2017-10-20 00:00:00    NA    0.429
     8 Appleby, Charlie 2017-10-21 00:00:00    NA    0.429
     9 Appleby, Charlie 2017-10-22 00:00:00    NA    0.429
    10 Appleby, Charlie 2017-10-23 00:00:00     0    0.375

Answer 2

一種選擇是組合所有天和所有培訓師，將其與原始數據合並，然后使用 14 天窗口：

library(zoo)
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(tidyverse)
df <- structure(list(Trainer = c("Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J", "Haggas, W J", "Appleby, Charlie", "Haggas, W J", "Appleby, Charlie", "Appleby, Charlie", "Haggas, W J"), Wins = c(1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0), Date = structure(c(1508025600, 1508112000, 1508112000, 1508112000, 1508198400, 1508284800, 1508284800, 1508284800, 1508457600, 1508457600, 1508544000, 1508544000, 1508544000, 1508716800, 1508716800, 1508716800, 1508803200, 1508803200, 1508803200, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1508889600, 1509062400, 1509062400, 1509062400, 1509062400, 1509062400, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509148800, 1509321600, 1509321600, 1509321600, 1509321600, 1509494400, 1509667200, 1509667200, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1509753600, 1510099200, 1510099200, 1510099200, 1510358400, 1510358400, 1510358400, 1521936000, 1521936000, 1523923200, 1523923200, 1523923200, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524009600, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524096000, 1524182400, 1524182400, 1524182400, 1524268800, 1524268800, 1524268800, 1524528000, 1524528000, 1524528000, 1524528000, 1524614400, 1524614400, 1524614400, 1524787200, 1524787200, 1524787200, 1524787200, 1524787200, 1525132800, 1525219200, 1525219200, 1525219200), tzone = "UTC", class = c("POSIXct", "POSIXt"))), row.names = c(NA, -101L), class = c("tbl_df", "tbl", "data.frame"))

all_dates <- with(df, expand_grid(Trainer = unique(Trainer), 
                                  Date = seq(min(Date), max(Date), by="1 day")))

all_dates <- left_join(all_dates, df)
#> Joining, by = c("Trainer", "Date")

all_dates %>% 
  group_by(Trainer) %>% 
  mutate(win_pct = rollapplyr(Wins, 
                              width=14, 
                              mean, 
                              partial = TRUE, 
                              align="right", 
                              na.rm=TRUE, 
                              fill=TRUE))
#> # A tibble: 460 × 4
#> # Groups:   Trainer [2]
#>    Trainer          Date                 Wins win_pct
#>    <chr>            <dttm>              <dbl>   <dbl>
#>  1 Appleby, Charlie 2017-10-15 00:00:00     1   1    
#>  2 Appleby, Charlie 2017-10-16 00:00:00    NA   1    
#>  3 Appleby, Charlie 2017-10-17 00:00:00    NA   1    
#>  4 Appleby, Charlie 2017-10-18 00:00:00     1   1    
#>  5 Appleby, Charlie 2017-10-18 00:00:00     0   0.667
#>  6 Appleby, Charlie 2017-10-19 00:00:00    NA   0.667
#>  7 Appleby, Charlie 2017-10-20 00:00:00    NA   0.667
#>  8 Appleby, Charlie 2017-10-21 00:00:00    NA   0.667
#>  9 Appleby, Charlie 2017-10-22 00:00:00    NA   0.667
#> 10 Appleby, Charlie 2017-10-23 00:00:00     0   0.5  
#> # … with 450 more rows

^{由reprex 包於 2022-05-31 創建 (v2.0.1)}

Answer 3

問題是 findInterval 的參數應該是數字和有序的。

為了解決這個問題，將日期轉換為 Date 類，然后轉換為數字，以便下面的 d 是自紀元以來的天數。 現在我們可以將它與 findInterval 一起使用，如圖所示。 如果數據已經排序，則可以省略排列行。

library(dplyr, exclude = c("filter", "lag"))
library(zoo)

DF %>%
  arrange(Trainer, Date) %>%
  group_by(Trainer) %>%
  mutate(d = as.numeric(as.Date(Date)), 
         Wins14 = rollapplyr(Wins, 1:n() - findInterval(d - 14, d), sum)) %>%
  ungroup %>%
  select(-d)

給予：

# A tibble: 101 x 4
   Trainer           Wins Date                Wins14
   <chr>            <dbl> <dttm>               <dbl>
 1 Appleby, Charlie     1 2017-10-15 00:00:00      1
 2 Appleby, Charlie     1 2017-10-18 00:00:00      2
 3 Appleby, Charlie     0 2017-10-18 00:00:00      2
 4 Appleby, Charlie     0 2017-10-23 00:00:00      2
 5 Appleby, Charlie     1 2017-10-25 00:00:00      3
 6 Appleby, Charlie     0 2017-10-25 00:00:00      3
 7 Appleby, Charlie     0 2017-10-25 00:00:00      3
 8 Appleby, Charlie     1 2017-10-25 00:00:00      4
 9 Appleby, Charlie     0 2017-10-27 00:00:00      4
10 Appleby, Charlie     0 2017-10-27 00:00:00      4
# ... with 91 more rows

在 R 的數據框中計算出不連續天數的變量的滾動總和

問題描述

3 個解決方案

解決方案1
1 2022-05-31 14:10:10

解決方案2
0 2022-05-31 14:05:06

解決方案3
0 已采納 2022-05-31 14:09:47

在 R 的數據框中計算出不連續天數的變量的滾動總和

問題描述

3 個解決方案

解決方案1 1 2022-05-31 14:10:10

解決方案2 0 2022-05-31 14:05:06

解決方案3 0 已采納 2022-05-31 14:09:47

解決方案1
1 2022-05-31 14:10:10

解決方案2
0 2022-05-31 14:05:06

解決方案3
0 已采納 2022-05-31 14:09:47