如何根据多个列共有的值对数据框进行分组？

Question

I am trying to aggregate a dataframe based on values that are found in two columns. 我试图基于在两列中找到的值聚合一个数据框。 I am trying to aggregate the dataframe such that the rows that have some value X in either column A or column B are aggregated together. 我正在尝试聚合数据帧，以使在A列或B列中具有某些值X的行聚合在一起。

More concretely, I am trying to do something like this. 更具体地说，我正在尝试做这样的事情。 Let's say I have a dataframe gameStats: 假设我有一个dataframe gameStats：

awayTeam  homeTeam  awayGoals  homeGoals
Chelsea   Barca     1          2
R. Madrid Barca     2          5
Barca     Valencia  2          2
Barca     Sevilla   1          0

... and so on ... 等等

I want to construct a dataframe such that among my rows I would have something like: 我想构造一个数据框，以使我的行中有类似以下内容的内容：

team    goalsFor  goalsAgainst
Barca   10        5

One obvious solution, since the set of unique elements is small, is something like this: 一个显而易见的解决方案是，因为唯一元素的集合很小，所以它是这样的：

for team in teamList:
    aggregateDf = gameStats[(gameStats['homeTeam'] == team) | (gameStats['awayTeam'] == team)]
# do other manipulations of the data then append it to a final dataframe

However, going through a loop seems less elegant. 但是，经历循环似乎不太优雅。 And since I have had this problem before with many unique identifiers, I was wondering if there was a way to do this without using a loop as that seems very inefficient to me. 而且由于我以前使用很多唯一标识符遇到了这个问题，所以我想知道是否有一种方法可以不使用循环，因为这对我来说似乎效率很低。

Answer 1

The solution is 2 folds, first compute goals for each team when they are home and away, then combine them. 解决方案是2折，首先为每个团队在出差时计算目标，然后将它们组合起来。 Something like: 就像是：

goals_when_away = gameStats.groupby(['awayTeam'])['awayGoals', 'homeGoals'].agg('sum').reset_index().sort_values('awayTeam')
goals_when_home = gameStats.groupby(['homeTeam'])['homeGoals', 'awayGoals'].agg('sum').reset_index().sort_values('homeTeam')

then combine them 然后结合起来

np_result = goals_when_away.iloc[:, 1:].values + goals_when_home.iloc[:, 1:].values
pd_result = pd.DataFrame(np_result, columns=['goal_for', 'goal_against'])
result = pd.concat([goals_when_away.iloc[:, :1], pd_result], axis=1, ignore_index=True)

Note .values when summing to get result in numpy array, and ignore_index=True when concat, these are to avoid pandas trap when it sums by column and index names. 注意在.values以在numpy数组中获取结果时使用.values ，在concat时请使用ignore_index=True ，这是为了避免在按列名和索引名求和时出现大熊猫陷阱。

如何根据多个列共有的值对数据框进行分组？

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-10-07 07:15:59

如何根据多个列共有的值对数据框进行分组？

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-10-07 07:15:59

解决方案1
0 已采纳 2018-10-07 07:15:59