[英]Calculating Moving Average of a column based on values of another column in a dataframe Python (Pandas)
I am trying to create a column of the 10-day moving average of points for nba players.我正在尝试为 nba 球员创建一个 10 天移动平均得分列。 My dataframe has game by game statistics for each player, and I would like to have the moving average column contain the 10 day moving average at that point.
我的 dataframe 有每个玩家的逐场统计数据,我想让移动平均列包含当时的 10 天移动平均线。 I have tried df.groupby('player')['points].rolling(10,1).mean, but this is just giving me the number of points scored on that day as the moving average.
我试过 df.groupby('player')['points].rolling(10,1).mean,但这只是给了我当天得分的移动平均数。 All of the players from each day are listed and then the dataframe moves onto the following day, so I could have a couple hundred rows with the same date but different players' stats.
列出了每天的所有球员,然后 dataframe 移动到第二天,所以我可以有几百行具有相同日期但不同球员的统计数据。 Any help would be greatly appreciated.
任何帮助将不胜感激。 Thanks.
谢谢。
As stated, you really should provide a sample dataset, and show what you are trying to achieve.如前所述,您确实应该提供一个示例数据集,并展示您想要实现的目标。 However, I love working with sports data so don't mind puting in the minute or so to get a sample set.
但是,我喜欢处理运动数据,所以不介意花一分钟左右的时间来获取样本集。
So basically you need to do a rolling mean on a groupby.所以基本上你需要对 groupby 做一个滚动平均值。 You'll notice obviously the first 10 rows of each player are blank, because it doesn't have 10 dates to take the mean of.
你会注意到每个玩家的前 10 行显然是空白的,因为它没有 10 个日期来取平均值。 You can change that by changing the min to 1. Also, when you do this, you want to make sure your data is sorted by date (which here it already is).
您可以通过将 min 更改为 1 来更改它。此外,当您这样做时,您希望确保您的数据按日期排序(这里已经是)。
import pandas as pd
player_link_list = ['https://www.basketball-reference.com/players/l/lavinza01/gamelog/2021/',
'https://www.basketball-reference.com/players/v/vucevni01/gamelog/2021/',
'https://www.basketball-reference.com/players/j/jamesle01/gamelog/2021/',
'https://www.basketball-reference.com/players/d/davisan02/gamelog/2021/']
dfs = []
for link in player_link_list:
w=1
df = pd.read_html(link)[-1]
df = df[df['Rk'].ne('Rk')]
df = df[df['PTS'].ne('Inactive')]
df['Player'] = link.split('/')[-4]
df['PTS'] = df['PTS'].astype(int,errors = 'ignore')
dfs.append(df)
df = pd.concat(dfs)
df['rolling_10_avg'] = df.groupby('Player')['PTS'].transform(lambda s: s.rolling(10, min_periods=10).mean())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.