简体   繁体   English

如何使用自定义权重计算加权移动平均线?

[英]How do I calculate a weighted moving average with custom weights?

I'm working with NHL player performance data, and have a data frame with the following variables (among others). 我正在使用NHL播放器性能数据,并且具有包含以下变量的数据框(以及其他)。 war_82 is a measure of player value over a full 82 game season. war_82是衡量整个82场比赛中球员价值的指标。 The data spans 11 seasons, from 2007-2008 to 2017-2018. 该数据涵盖了从2007年至2008年至2017年至2018年的11个季节。

 first_name last_name season    war_82
   <chr>      <chr>     <chr>      <dbl>
 1 5EBASTIAN  AHO       2017-2018 -0.560
 2 AARON      DELL      2016-2017  7.50 
 3 AARON      DELL      2017-2018  1.61 
 4 AARON      DOWNEY    2007-2008 -0.560
 5 AARON      EKBLAD    2014-2015  0.350
 6 AARON      EKBLAD    2015-2016 -0.350
 7 AARON      EKBLAD    2016-2017 -1.39 
 8 AARON      EKBLAD    2017-2018 -0.320
 9 AARON      JOHNSON   2007-2008 -1.42 
10 AARON      JOHNSON   2008-2009 -1.19 

I'd like to reduce the season-to-season variability of the war_82 metric, and create a new variable that's a weighted war_82. 我想减少war_82指标的季节变化,并创建一个加权war_82的新变量。 Ideally I'd look at 3 seasons of data, and have season n (the current season) be the most heavily weighted, and seasons n-1 and n-2 (the two preceding seasons) be less heavily weighted as recency decreases. 理想情况下,我会查看3个季节的数据,并且第n季(当前季节)是加权最重的,而季节n-1和n-2(前两个季节)的权重越小,新近度越低。 Let's say weights of 0.5, 0.3, and 0.2 for argument's sake. 假设参数的重量为0.5,0.3和0.2。

UPDATE FOR CLARITY: I'm hoping to calculate a weighted moving average. 更新清晰度:我希望计算加权移动平均线。 For example; 例如; Sidney Crosby's 20172018_weighted_war would be be determined by 2017-2018, 2016-2017, and 2015-2016. Sidney Crosby的20172018_weighted_war将在2017-2018,2016-2017和2015-2016之间确定。 His 20162017_weighted_war would be be determined by 2016-2017, 2015-2016, and 2014-2015. 他的20162017_weighted_war将由2016-2017,2015-2016和2014-2015决定。 So on and so forth. 等等等等。

I have two main questions: 我有两个主要问题:

1) What method would you recommend for this? 1)你会为此推荐什么方法? I've looked at weighted.mean(), but some players have played more than others, so I'm not sure how to specify the "w" (weights) argument. 我看过weighted.mean(),但有些玩家比其他玩家玩得更多,所以我不知道如何指定“w”(权重)参数。 For example, Sidney Crosby played during all 11 seasons in my data-set, but many players only played during 1 or 2 seasons. 例如,西德尼·克罗斯比在我的数据集中打了11个赛季,但很多球员只打了1或2个赛季。 I don't really want to throw out data for players who have played fewer than 3 seasons. 我真的不想丢掉那些打不到3个赛季的球员的数据。

2) How would you determine the weights for each season? 2)你如何确定每个赛季的重量? The simplest method is the one I've mentioned above, which was sort of inspired by the Marcel method ( https://www.beyondtheboxscore.com/2016/2/22/11079186/projections-marcel-pecota-zips-steamer-explained-guide-math-is-fun ). 最简单的方法是我上面提到的方法,它有点受到Marcel方法的启发( https://www.beyondtheboxscore.com/2016/2/22/11079186/projections-marcel-pecota-zips-steamer-解释 - 指南 - 数学 - 很有趣 )。 I suppose you could also determine how well seasons n-1 and n-2 predict season n, and use those as your weights? 我想你也可以确定n-1和n-2季节预测季节n的好坏程度,并将它们用作你的体重?

How would you approach this problem? 你会如何解决这个问题? Any and all guidance is greatly appreciated! 非常感谢任何和所有指导!

I have a similar answer to JasonAizkalns, but it's different enough that I think it may be worth posting. 我对JasonAizkalns有一个类似的答案,但它有所不同,我认为值得张贴。

You can fiddle with the weights for the seasons. 你可以摆弄季节的重量。

EDIT: Added 'rolling average' 编辑:添加'滚动平均值'

data <- readr::read_table("
first_name last_name season    war_82
5EBASTIAN  AHO       2017-2018 -0.560
AARON      DELL      2016-2017  7.50 
AARON      DELL      2017-2018  1.61 
AARON      DOWNEY    2007-2008 -0.560
AARON      EKBLAD    2014-2015  0.350
AARON      EKBLAD    2015-2016 -0.350
AARON      EKBLAD    2016-2017 -1.39 
AARON      EKBLAD    2017-2018 -0.320
AARON      JOHNSON   2007-2008 -1.42 
AARON      JOHNSON   2008-2009 -1.19")

weigth_war <- function(last3_war) {
    player_season <- as.numeric(stringr::str_split_fixed(last3_war, " ", 3))
    if (is.na(player_season[2]))
        player_season[1]
    else if (is.na(player_season[3]))
        weighted.mean(player_season[1:2], c(0.3, 0.7))
    else
        weighted.mean(player_season, c(0.2, 0.3, 0.5))
}

library(tidyverse)
data %>%
    mutate(name = paste(first_name, last_name)) %>%
    group_by(name) %>%
    arrange(name, season) %>%
    mutate(last3_war = paste(war_82, lag(war_82), lag(war_82, 2))) %>%
    ungroup() %>%
    rowwise() %>%
    mutate(weighted_war_82 = weigth_war(last3_war)) %>%
    select(name, season, war_82, weighted_war_82)

I would recommend sticking to one question per post. 我建议每篇文章坚持一个问题。 A brute-force approach to your first question would be to explicitly express the weights based on the number of seasons: 对你的第一个问题进行蛮力处理的方法是根据季节数明确表达权重:

library(tidyverse)

df <- tribble(
  ~player, ~season, ~y,
  "dell", 2017, 1,
  "dell", 2018, 5,
  "johnson", 2016, 2,
  "johnson", 2017, 4,
  "johnson", 2018, 5,
  "downey", 2014, 3,
  "downey", 2015, 5
)

df %>%
  group_by(player) %>%
  arrange(player, season) %>%
  add_count(player, name = "num_seasons") %>%
  mutate(
    wtd = case_when(
      num_seasons == 1 ~ sum(                                           1.000 * nth(y, -1) ),
      num_seasons == 2 ~ sum(                      0.375 * nth(y, -2) + 0.625 * nth(y, -1) ),
      num_seasons == 3 ~ sum( 0.200 * nth(y, -3) + 0.300 * nth(y, -2) + 0.500 * nth(y, -1) )
    )
  )
#> # A tibble: 7 x 5
#> # Groups:   player [3]
#>   player  season     y num_seasons   wtd
#>   <chr>    <dbl> <dbl>       <int> <dbl>
#> 1 dell      2017     1           2  3.5 
#> 2 dell      2018     5           2  3.5 
#> 3 downey    2014     3           2  4.25
#> 4 downey    2015     5           2  4.25
#> 5 johnson   2016     2           3  4.1 
#> 6 johnson   2017     4           3  4.1 
#> 7 johnson   2018     5           3  4.1

您可以按照建议使用weighted.mean(),并按11(1个赛季 - > 0.091,2 - > 0.18等)赛季的数量加权。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM