简体   繁体   English

熊猫计算具有特定行和列的滚动总和

[英]Pandas calculating rolling sum with specific rows and columns

I have quite a specific problem.我有一个非常具体的问题。

I'm trying to analyse some historic football data and want to create a couple of columns with each teams most recent goal tallys for both home and away, for example.例如,我正在尝试分析一些历史足球数据,并希望为每支球队最近的主场和客场进球数创建几列。 I've tried to simplify things here, let's say the df looks like this:我试图在这里简化事情,假设df看起来像这样:

df = pd.DataFrame({'Home':['A','B','C','B','A','A','C'],'Away':['B','C','A','C','B','B','A'],
                   'HG':[1,2,3,2,1,4,1],'AG':[2,4,5,1,3,2,2]})
  Home Away  HG  AG
0    A    B   1   2
1    B    C   2   4
2    C    A   3   5
3    B    C   2   1
4    A    B   1   3
5    A    B   4   2
6    C    A   1   2

What I want to do is to sum the most recent two goal numbers ( HG and/or AG ) for both Home and Away for each row in the df.我想要做的是对 df 中每一行的 Home 和 Away 的最近两个目标数字( HG和/或AG )求和。 But I obviously don't want to take in to account the most recent row.但我显然不想考虑最近的一行。

So if we look at index row 0. Home is 'A'.因此,如果我们查看索引行 0。Home 是“A”。 The number I would expect to get back is 6, 5 from index row 2 under AG as A are the away team in this row, and 1 from index row 4 as A is the Home team, equaling 6 in total.我希望得到的数字是 6,AG 下索引第 2 行的 5 个,因为 A 是本行的客队,索引第 4 行的 1 个,因为 A 是主队,总共等于 6。 For the Away team B in index row 0 I would expect the result to be 4, from index row 1 and index row 3. And so forth.对于索引行 0 中的客队 B,我希望结果是 4,来自索引行 1 和索引行 3。依此类推。 I would also like to return np.NaN should it have less than 2 data points to calcualte from.如果要计算的数据点少于 2 个,我还想返回np.NaN

I initally thought of maybe writing a little function to help do this, something similar to this but obviously this is woefully incorrect:我最初想可能会写一个小函数来帮助做到这一点,与此类似,但显然这是非常不正确的:

def get_rolling_sum(x):
    count_list = []
    new_df = df[(df['Home'] == str(x)) | (df['Away'] == str(x))]
    for i in range(0,len(new_df)):
        if new_df['Home'].iloc[i] == str(x):
            count_list.append(new_df['HG'].iloc[i])
        elif new_df['Away'].iloc[i] == str(x):
            count_list.append(new_df['AG'].iloc[i])
df['Roll_Home'] = [get_rolling_sum(x) for x in df['Home']]

What I'm hoping to get is something like this:我希望得到的是这样的:

  Home Away  HG  AG  Expected_Home
0    A    B   1   2            6.0
1    B    C   2   4            5.0
2    C    A   3   5            2.0
3    B    C   2   1            5.0
4    A    B   1   3            6.0
5    A    B   4   2            NaN
6    C    A   1   2            NaN

Many thanks非常感谢

First lets add a column to the dataframe such that the row index is available.首先让我们向数据框中添加一列,以便行索引可用。 Then create a stacked dataframe so that the Home and Away columns become a single column, and the HG and AG column become a single column, while keeping the index intact.然后创建一个堆叠的数据框,使 Home 和 Away 列成为一列,HG 和 AG 列成为一个列,同时保持索引不变。 Basically the Home and Away values from the original df will become two successive rows.基本上来自原始 df 的 Home 和 Away 值将成为两个连续的行。 Then take the most recent two rows from the stacked dataframe with reference index greater than the original index and add the goals.然后从堆叠数据框中取出参考索引大于原始索引的最近两行并添加目标。 (You have to make the last two rows NaN manually). (您必须手动制作最后两行 NaN)。

df = pd.DataFrame({'Home':['A','B','C','B','A','A','C'],'Away':['B','C','A','C','B','B','A'],
               'HG':[1,2,3,2,1,4,1],'AG':[2,4,5,1,3,2,2]})[['Home', 'Away', 'HG', 'AG']]
df['ref_index'] = df.index

df_stack = pd.concat([df[['Home', 'HG']].rename(columns = {'Home':'Loc', 'HG':'Goals'}), 
                  df[['Away', 'AG']].rename(columns = {'Away':'Loc', 'AG':'Goals'})]).sort_index(kind='merge')
df_stack['ref_index'] = df_stack.index

df['Expected_Home'] = df.apply(lambda row: df_stack[(df_stack.Loc == row['Home']) & 
                                                (df_stack.ref_index > row['ref_index'])].iloc[:2].Goals.sum(),
                           axis = 1)

print(df)

     Home   Away    HG  AG  ref_index   Expected_Home
   0    A      B    1   2          0    6
   1    B      C    2   4          1    5
   2    C      A    3   5          2    2
   3    B      C    2   1          3    5
   4    A      B    1   3          4    6
   5    A      B    4   2          5    2
   6    C      A    1   2          6    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM