如何使用 Pandas 根据数据框中另一列的值获取 2 列的总和

Question

I have a dataset with a lot of columns and I already have filtered the ones that I need:我有一个包含很多列的数据集，并且我已经过滤了我需要的那些：

import pandas as pd

data = pd.read_csv("./data/GL1871.txt", header=None, usecols=[3,6,9,10])
new_data = data.rename(columns={3: 'Away', 6: 'Home', 9: 'Away runs', 10: 'Home runs'})

What I want to get out of this dataframe is the sum of Column 'Away runs' and 'Home runs' for each team.我想从这个数据框中得到的是每个团队的“客场跑”和“本垒打”列的总和。 The output should look something like this:输出应如下所示：

0  CL1 364
1  BS1 254
...
9  CH1 190

So far I have tried the groupby() method and the output is not what I really need:到目前为止，我已经尝试了 groupby() 方法，但输出并不是我真正需要的：

runs_away = new_data.groupby( 'Home')['Away runs'].sum()
runs_away = new_data.groupby( 'Home')['Home runs'].sum()

Home
BS1    165
CH1    139
CL1    162
FW1    112
NY2    127
PH1    124
RC1     66
TRO    231
WS3    120
Name: Away runs, dtype: int64
Home
BS1    223
CH1    197
CL1    119
FW1     78
NY2    178
PH1    180
RC1     72
TRO    200
WS3    166
Name: Home runs, dtype: int64

Is there any smart way to do this and getting both values at the same time?有没有什么聪明的方法可以做到这一点并同时获得两个值？ Maybe a comprehension would be better, but I don't know how to iterate a Dataframe.也许理解会更好，但我不知道如何迭代 Dataframe。 Thank you in advance.先感谢您。

Also, the expected result is to get the sum of the runs in Away runs and Home runs for the same team.此外，预期结果是获得同一团队的客场跑和本垒打的总和。 So sum Away runs where the Away is team x + Home runs where Home is team x as well所以总和客场跑，其中客队是 x 队 + 本垒打，主场也是 x 队

Adding sample from the dataset:从数据集中添加样本：

    Away Home  Away runs  Home runs
0    CL1  FW1          0          2
1    BS1  WS3         20         18
2    CL1  RC1         12          4
3    CL1  CH1         12         14
4    BS1  TRO          9          5

Desired output:期望的输出：

0  CL1 364 
1  BS1 254
...
9  CH1 190

#Where 364 is the sum of all runs of the team CL1 wether it was away or home #其中 364 是 CL1 团队所有运行的总和，无论是客场还是主场

       Away Home  Away runs  Home runs
0   CL1  FW1          0          2
1   BS1  WS3         20         18
2   CL1  RC1         12          4
3   CL1  CH1         12         14
4   BS1  TRO          9          5
5   CH1  CL1         18         10
6   WS3  CL1         12          8
7   CH1  FW1         14          5
8   WS3  FW1          6         12
9   TRO  BS1         29         14
10  WS3  CH1          4         14

Answer 1

It depends on how the dataframe is structured,but if the run_aways are pandas series, you can simply add them.这取决于数据框的结构，但如果 run_aways 是熊猫系列，您可以简单地添加它们。 You could also do it in one line like :您也可以在一行中完成，例如：

runs = new_data.groupby( 'Home')['Away runs'].sum() + new_data.groupby( 'Home')['Home runs'].sum()

You could also use a mask and the .apply method but I find it les readable.你也可以使用掩码和 .apply 方法，但我觉得它不太可读。

Answer 2

Pretty sure I understand what you're trying to do.我很确定我明白你想要做什么。 You'll need to do some reshaping of your data to stack the teams and runs so that they're in 2 long columns.您需要对数据进行一些重塑以堆叠团队并运行，以便它们位于 2 个长列中。 Then you can perform a groupby operation to get the total number of runs per team.然后您可以执行 groupby 操作以获取每个团队的总运行次数。

away = df[["Away", "Away runs"]]
home = df[["Home", "Home runs"]]

new_df = pd.DataFrame(np.vstack([home, away]), columns=["team", "runs"])

print(new_df)
   team runs
0   FW1    2
1   WS3   18
2   RC1    4
3   CH1   14
4   TRO    5
5   CL1   10
6   CL1    8
7   FW1    5
8   FW1   12
9   BS1   14
10  CH1   14
11  CL1    0
12  BS1   20
13  CL1   12
14  CL1   12
15  BS1    9
16  CH1   18
17  WS3   12
18  CH1   14
19  WS3    6
20  TRO   29
21  WS3    4

Now that the teams are all in a single column, and their corresponding runs in the other.现在团队都在一个列中，而他们相应的运行在另一个列中。 We can perform a simple groupby operation to calculate the total number of runs per team, ignoring whether they were "home" or "away"我们可以执行一个简单的groupby操作来计算每支球队的总跑步次数，忽略他们是"home"还是"away"

team_runs = new_df.groupby("team", as_index=False)["runs"].sum()

print(team_runs)
  team  runs
0  BS1    43
1  CH1    60
2  CL1    42
3  FW1    19
4  RC1     4
5  TRO    34
6  WS3    40

如何使用 Pandas 根据数据框中另一列的值获取 2 列的总和

问题描述

2 个解决方案

解决方案1
1 2020-10-22 15:40:02

解决方案2
0 已采纳 2020-10-23 17:04:29

如何使用 Pandas 根据数据框中另一列的值获取 2 列的总和

问题描述

2 个解决方案

解决方案1 1 2020-10-22 15:40:02

解决方案2 0 已采纳 2020-10-23 17:04:29

解决方案1
1 2020-10-22 15:40:02

解决方案2
0 已采纳 2020-10-23 17:04:29