简体   繁体   English

根据 dataframe Python 中另一列的值计算一列的移动平均值(熊猫)

[英]Calculating Moving Average of a column based on values of another column in a dataframe Python (Pandas)

I am trying to create a column of the 10-day moving average of points for nba players.我正在尝试为 nba 球员创建一个 10 天移动平均得分列。 My dataframe has game by game statistics for each player, and I would like to have the moving average column contain the 10 day moving average at that point.我的 dataframe 有每个玩家的逐场统计数据,我想让移动平均列包含当时的 10 天移动平均线。 I have tried df.groupby('player')['points].rolling(10,1).mean, but this is just giving me the number of points scored on that day as the moving average.我试过 df.groupby('player')['points].rolling(10,1).mean,但这只是给了我当天得分的移动平均数。 All of the players from each day are listed and then the dataframe moves onto the following day, so I could have a couple hundred rows with the same date but different players' stats.列出了每天的所有球员,然后 dataframe 移动到第二天,所以我可以有几百行具有相同日期但不同球员的统计数据。 Any help would be greatly appreciated.任何帮助将不胜感激。 Thanks.谢谢。

As stated, you really should provide a sample dataset, and show what you are trying to achieve.如前所述,您确实应该提供一个示例数据集,并展示您想要实现的目标。 However, I love working with sports data so don't mind puting in the minute or so to get a sample set.但是,我喜欢处理运动数据,所以不介意花一分钟左右的时间来获取样本集。

So basically you need to do a rolling mean on a groupby.所以基本上你需要对 groupby 做一个滚动平均值。 You'll notice obviously the first 10 rows of each player are blank, because it doesn't have 10 dates to take the mean of.你会注意到每个玩家的前 10 行显然是空白的,因为它没有 10 个日期来取平均值。 You can change that by changing the min to 1. Also, when you do this, you want to make sure your data is sorted by date (which here it already is).您可以通过将 min 更改为 1 来更改它。此外,当您这样做时,您希望确保您的数据按日期排序(这里已经是)。

import pandas as pd

player_link_list = ['https://www.basketball-reference.com/players/l/lavinza01/gamelog/2021/',
                    'https://www.basketball-reference.com/players/v/vucevni01/gamelog/2021/',
                    'https://www.basketball-reference.com/players/j/jamesle01/gamelog/2021/',
                    'https://www.basketball-reference.com/players/d/davisan02/gamelog/2021/']

dfs = []
for link in player_link_list:
    w=1
    df = pd.read_html(link)[-1]
    df = df[df['Rk'].ne('Rk')]   
    df = df[df['PTS'].ne('Inactive')]   
    df['Player'] = link.split('/')[-4]
    df['PTS'] = df['PTS'].astype(int,errors = 'ignore')
    dfs.append(df)
    

df = pd.concat(dfs)

df['rolling_10_avg'] = df.groupby('Player')['PTS'].transform(lambda s: s.rolling(10, min_periods=10).mean())  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于另一列平均一个 python dataframe 列 - Average a python dataframe column based on another column 根据另一列中出现的数字计算列的平均值 - Calculating mean of column based on the occurence of a number in another column Pandas dataframe Python 基于另一个数据框 python pandas 替换列值 - 更好的方法? - Replace column values based on another dataframe python pandas - better way? 根据 pandas 中另一列的值计算一列的平均值 - calculate average of a column based on values of another column in pandas Pandas 基于另一个 DataFrame 修改列值 - Pandas modify column values based on another DataFrame Pandas 根据列值将 Dataframe 划分为另一个 - Pandas Divide Dataframe by Another Based on Column Values 在基于另一个 dataframe 计算值之后,将一列添加到 dataframe - Add a column to a dataframe after calculating values based on another dataframe 如何根据python(pandas,jupyter)中的另一列值获取一列的平均值 - how to get the average of values for one column based on another column value in python (pandas, jupyter) 根据另一列的值替换Pandas数据框的Column的值 - Replace values of a Pandas dataframe's Column based on values of another column pandas数据框根据另一数据框中的值将值追加到一列 - pandas dataframe append values to one column based on the values in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM