简体   繁体   中英

How do I groupby a dataframe based on values that are common to multiple columns?

I am trying to aggregate a dataframe based on values that are found in two columns. I am trying to aggregate the dataframe such that the rows that have some value X in either column A or column B are aggregated together.

More concretely, I am trying to do something like this. Let's say I have a dataframe gameStats:

awayTeam  homeTeam  awayGoals  homeGoals
Chelsea   Barca     1          2
R. Madrid Barca     2          5
Barca     Valencia  2          2
Barca     Sevilla   1          0

... and so on

I want to construct a dataframe such that among my rows I would have something like:

team    goalsFor  goalsAgainst
Barca   10        5

One obvious solution, since the set of unique elements is small, is something like this:

for team in teamList:
    aggregateDf = gameStats[(gameStats['homeTeam'] == team) | (gameStats['awayTeam'] == team)]
# do other manipulations of the data then append it to a final dataframe

However, going through a loop seems less elegant. And since I have had this problem before with many unique identifiers, I was wondering if there was a way to do this without using a loop as that seems very inefficient to me.

The solution is 2 folds, first compute goals for each team when they are home and away, then combine them. Something like:

goals_when_away = gameStats.groupby(['awayTeam'])['awayGoals', 'homeGoals'].agg('sum').reset_index().sort_values('awayTeam')
goals_when_home = gameStats.groupby(['homeTeam'])['homeGoals', 'awayGoals'].agg('sum').reset_index().sort_values('homeTeam')

then combine them

np_result = goals_when_away.iloc[:, 1:].values + goals_when_home.iloc[:, 1:].values
pd_result = pd.DataFrame(np_result, columns=['goal_for', 'goal_against'])
result = pd.concat([goals_when_away.iloc[:, :1], pd_result], axis=1, ignore_index=True)

Note .values when summing to get result in numpy array, and ignore_index=True when concat, these are to avoid pandas trap when it sums by column and index names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM