在 Python Pandas 中，条件不只是按行分组

Question

我有两个数据帧，一个在较低级别，一个在较高级别汇总数据。 我正在尝试向汇总表中添加一个新列，该列汇总了所有特定运动爱好者的总支出。 IE 在足球的摘要行中我不想总结足球总支出，而是任何在足球上花费任何东西的人的总体育支出。

df = pd.DataFrame({'Person': [1,2,3,3,3],
                   'Sport': ['Soccer','Tennis','Tennis','Football','Soccer'],
                  'Ticket_Cost': [10,20,10,10,20]})

df2 = pd.DataFrame({'Sport': ['Soccer','Tennis','Football']})

我目前可以通过很多步骤来做到这一点，但我相信有一种更有效/更快的方法。 这是我目前的做法。

#Calculate the total spend for each person in an temporary dataframe
df_intermediate = df.groupby(['Person'])['Ticket_Cost'].sum()
df_intermediate= df_intermediate.rename("Total_Sports_Spend")

人 Total_Sports_Spend
1 10
2 20
3 40

#place this total in the detailed table
df = pd.merge(df,df_intermediate,how='left',on='Person')

#Create a second temporary dataframe
df_intermediate2 = df.groupby(['Sport'])['Total_Sports_Spend'].sum()

运动总计_运动_花费
足球 40
足球 50
网球 60

#Merge this table with the summary table
df2 = pd.merge(df2,df_intermediate2,how='left',on='Sport')

运动总计_运动_花费
0 足球 50
1 网球 60
2 足球 40

最后，我清理了临时数据框并从明细表中删除了额外的列。 我相信有更好的方法。

Answer 1

您可能想要在 2D 中旋转您的 DataFrame：

df2 = df.pivot_table(index = 'Person', columns = 'Sport', values = 'Ticket_Cost')

你得到

Sport   Football    Soccer  Tennis
Person          
     1  NaN         10.0    NaN
     2  NaN         NaN     20.0
     3  10.0        20.0    10.0

现在您可以计算每个人的总支出：

total = df2.sum(axis=1)

这是

Person
1    10.0
2    20.0
3    40.0
dtype: float64

最后，你地方总支出值total中的细胞df2其中单元具有正值：

df3 = (df2>0).mul(total, axis=0)

这是在这里：

Sport   Football    Soccer  Tennis
Person          
     1  0.0     10.0    0.0
     2  0.0     0.0     20.0
     3  40.0    40.0    40.0

最后，您只需要沿列求和即可获得所需内容：

spending = df3.sum(axis=0)

并且会得到你所期望的。

在 Python Pandas 中，条件不只是按行分组

问题描述

1 个解决方案

解决方案1
0 2020-09-06 15:01:21

在 Python Pandas 中，条件不只是按行分组

问题描述

1 个解决方案

解决方案1 0 2020-09-06 15:01:21

解决方案1
0 2020-09-06 15:01:21