简体   繁体   English

在 r 中按组计算加权滚动平均值?

[英]Calculate a Weighted Rolling Average by rows by group in r?

I have a dataframe games_h.我有一个 dataframe games_h。 This is just a snippet of the table but it has many teams and is sorted by date, team, game number.这只是表格的一小部分,但它有很多球队,并按日期、球队、比赛编号排序。 I am trying to create a weighted rolling average grouped by the team.我正在尝试创建一个由团队分组的加权滚动平均值。 I would like the most recent game to be weighted more than two games ago.我希望最近一场比赛的权重超过两场比赛之前。 So the weights would be (Game_1 * 1+ Game_2 *2)/3 or weights equal to 1 with same ratio so weights = c(1-.667, .667).因此权重将为 (Game_1 * 1+ Game_2 *2)/3 或权重等于 1 且比率相同,因此权重 = c(1-.667, .667)。

dput(games_h)
structure(list(GameId = c(16, 16, 37, 37, 57, 57), GameDate = structure(c(17905, 
17905, 17916, 17916, 17926, 17926), class = "Date"), NeutralSite = c(0, 
0, 0, 0, 0, 0), AwayTeam = c("Virginia Cavaliers", "Virginia Cavaliers", 
"Florida State Seminoles", "Florida State Seminoles", "Syracuse Orange", 
"Syracuse Orange"), HomeTeam = c("Boston College Eagles", "Boston College Eagles", 
"Boston College Eagles", "Boston College Eagles", "Boston College Eagles", 
"Boston College Eagles"), Team = c("Virginia Cavaliers", "Boston College Eagles", 
"Florida State Seminoles", "Boston College Eagles", "Syracuse Orange", 
"Boston College Eagles"), Home = c(0, 1, 0, 1, 0, 1), Score = c(83, 
56, 82, 87, 77, 71), AST = c(17, 6, 12, 16, 11, 13), TOV = c(10, 
8, 9, 13, 11, 11), STL = c(5, 4, 4, 6, 6, 5), BLK = c(6, 0, 4, 
4, 1, 0), Rebounds = c(38, 18, 36, 33, 23, 23), ORB = c(7, 4, 
16, 10, 7, 6), DRB = c(31, 14, 20, 23, 16, 17), FGA = c(55, 57, 
67, 55, 52, 45), FGM = c(33, 22, 28, 27, 29, 21), X3FGM = c(8, 
7, 8, 13, 11, 9), X3FGA = c(19, 25, 25, 21, 26, 22), FTA = c(14, 
9, 24, 28, 15, 23), FTM = c(9, 5, 18, 20, 8, 20), Fouls = c(16, 
12, 25, 20, 19, 19), Game_Number = 1:6, Count = c(1, 1, 1, 1, 
1, 1)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), groups = structure(list(HomeTeam = "Boston College Eagles", 
    .rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

Here is an example output of the score column.这是得分列的示例 output。

在此处输入图像描述

Here is my failed attempt.这是我失败的尝试。 The function work correctly but I cannot apply it to the entire dataframe by group. function 工作正常,但我无法按组将其应用于整个 dataframe。


weighted_avg<-function(x, wt1, wt2) {
  
rs1 = rollsum(x,1,align = "right")
rs2 = rollsum(x,2,align = "right")
rs1=rs1[-1]
rs3 = rs2 - rs1
weighted_avg= ((rs3 * wt2)+ (rs1*wt1))/(wt1+wt2)
return(weighted_avg)
}



weighted_avg(csum$Score_Y, 2, 1)
apply(csum$Score_Y , 2, weighted_avg, wt1 = 2, wt2=1)
test<-csum %>%
  group_by(Team)%>%
  group_map(across(c(Score:Fouls), weighted_avg(.x$Team, 2, 1)  ))

test<-csum %>%
  group_by(Team)%>%
  group_walk(across(c(Score:Fouls),weighted_avg(.~,2,1)  ))

Here are some notes about the code:以下是有关代码的一些注释:

  • I used slider::slide_dbl function. First we specify the vector for which we would like to compute the moving average Score .我使用slider::slide_dbl function。首先我们指定要为其计算移动平均Score的向量。
  • As we need a sliding window of length 2, I used .before argument in slide_dbl to use the previous value and a current value to be used for calculating moving average.因为我们需要长度为 2 的滑动 window,所以我在slide_dbl中使用了.before参数来使用先前的值和当前值来计算移动平均值。
  • Also I set .complete argument to TRUE to makes sure to only calculate moving average when we have a previous value.此外,我将.complete参数设置为TRUE以确保仅在我们有先前值时才计算移动平均线。 In other word we don't have any moveing average in first row.换句话说,我们在第一行没有任何移动平均线。

For more info check the documentation for slider package.有关详细信息,请查看slider package 的文档。

library(tidyverse)
library(slider)

df %>%
  group_by(HomeTeam) %>%
  summarise(Example = c(NA, slide_dbl(Score, .before = 1, .complete = TRUE, 
                                      .f = ~ (.x[1] * 1 + .x[2] * 2) / 3)))

`summarise()` has grouped output by 'HomeTeam'. You can override using the `.groups` argument.
# A tibble: 7 × 2
# Groups:   HomeTeam [1]
  HomeTeam              Example
  <chr>                   <dbl>
1 Boston College Eagles    NA  
2 Boston College Eagles    NA  
3 Boston College Eagles    65  
4 Boston College Eagles    73.3
5 Boston College Eagles    85.3
6 Boston College Eagles    80.3
7 Boston College Eagles    73 

If it is going to calculate moving average for all numeric columns you could try:如果要计算所有数字列的移动平均值,您可以尝试:

df %>%
  group_by(HomeTeam) %>%
  summarise(across(where(is.numeric), ~ c(NA, slide_dbl(., .before = 1, .complete = TRUE, 
                                                      .f = ~ (.x[1] * 1 + .x[2] * 2) / 3)))) %>%
  ungroup()

`summarise()` has grouped output by 'HomeTeam'. You can override using the `.groups` argument.
# A tibble: 7 × 21
  HomeTeam  GameId NeutralSite   Home Score   AST   TOV   STL    BLK Rebounds   ORB   DRB   FGA   FGM
  <chr>      <dbl>       <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 Boston C…   NA            NA NA      NA   NA    NA    NA    NA         NA   NA     NA    NA    NA  
2 Boston C…   NA            NA NA      NA   NA    NA    NA    NA         NA   NA     NA    NA    NA  
3 Boston C…   16             0  0.667  65    9.67  8.67  4.33  2         24.7  5     19.7  56.3  25.7
4 Boston C…   30             0  0.333  73.3 10     8.67  4     2.67      30   12     18    63.7  26  
5 Boston C…   37             0  0.667  85.3 14.7  11.7   5.33  4         34   12     22    59    27.3
6 Boston C…   50.3           0  0.333  80.3 12.7  11.7   6     2         26.3  8     18.3  53    28.3
7 Boston C…   57             0  0.667  73   12.3  11     5.33  0.333     23    6.33  16.7  47.3  23.7
# … with 7 more variables: X3FGM <dbl>, X3FGA <dbl>, FTA <dbl>, FTM <dbl>, Fouls <dbl>,
#   Game_Number <dbl>, Count <dbl>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM