简体   繁体   English

如何使用 Pandas 根据数据框中另一列的值获取 2 列的总和

[英]How do I use Pandas to get the sum of 2 columns based on the value of another column in a dataframe

I have a dataset with a lot of columns and I already have filtered the ones that I need:我有一个包含很多列的数据集,并且我已经过滤了我需要的那些:

import pandas as pd

data = pd.read_csv("./data/GL1871.txt", header=None, usecols=[3,6,9,10])
new_data = data.rename(columns={3: 'Away', 6: 'Home', 9: 'Away runs', 10: 'Home runs'})

What I want to get out of this dataframe is the sum of Column 'Away runs' and 'Home runs' for each team.我想从这个数据框中得到的是每个团队的“客场跑”和“本垒打”列的总和。 The output should look something like this:输出应如下所示:

0  CL1 364
1  BS1 254
...
9  CH1 190

So far I have tried the groupby() method and the output is not what I really need:到目前为止,我已经尝试了 groupby() 方法,但输出并不是我真正需要的:

runs_away = new_data.groupby( 'Home')['Away runs'].sum()
runs_away = new_data.groupby( 'Home')['Home runs'].sum()

Home
BS1    165
CH1    139
CL1    162
FW1    112
NY2    127
PH1    124
RC1     66
TRO    231
WS3    120
Name: Away runs, dtype: int64
Home
BS1    223
CH1    197
CL1    119
FW1     78
NY2    178
PH1    180
RC1     72
TRO    200
WS3    166
Name: Home runs, dtype: int64

Is there any smart way to do this and getting both values at the same time?有没有什么聪明的方法可以做到这一点并同时获得两个值? Maybe a comprehension would be better, but I don't know how to iterate a Dataframe.也许理解会更好,但我不知道如何迭代 Dataframe。 Thank you in advance.先感谢您。

Also, the expected result is to get the sum of the runs in Away runs and Home runs for the same team.此外,预期结果是获得同一团队的客场跑和本垒打的总和。 So sum Away runs where the Away is team x + Home runs where Home is team x as well所以总和 客场跑,其中客队是 x 队 + 本垒打,主场也是 x 队

Adding sample from the dataset:从数据集中添加样本:

    Away Home  Away runs  Home runs
0    CL1  FW1          0          2
1    BS1  WS3         20         18
2    CL1  RC1         12          4
3    CL1  CH1         12         14
4    BS1  TRO          9          5

Desired output:期望的输出:

0  CL1 364 
1  BS1 254
...
9  CH1 190

#Where 364 is the sum of all runs of the team CL1 wether it was away or home #其中 364 是 CL1 团队所有运行的总和,无论是客场还是主场

       Away Home  Away runs  Home runs
0   CL1  FW1          0          2
1   BS1  WS3         20         18
2   CL1  RC1         12          4
3   CL1  CH1         12         14
4   BS1  TRO          9          5
5   CH1  CL1         18         10
6   WS3  CL1         12          8
7   CH1  FW1         14          5
8   WS3  FW1          6         12
9   TRO  BS1         29         14
10  WS3  CH1          4         14

It depends on how the dataframe is structured,but if the run_aways are pandas series, you can simply add them.这取决于数据框的结构,但如果 run_aways 是熊猫系列,您可以简单地添加它们。 You could also do it in one line like :您也可以在一行中完成,例如:

runs = new_data.groupby( 'Home')['Away runs'].sum() + new_data.groupby( 'Home')['Home runs'].sum()

You could also use a mask and the .apply method but I find it les readable.你也可以使用掩码和 .apply 方法,但我觉得它不太可读。

Pretty sure I understand what you're trying to do.我很确定我明白你想要做什么。 You'll need to do some reshaping of your data to stack the teams and runs so that they're in 2 long columns.您需要对数据进行一些重塑以堆叠团队并运行,以便它们位于 2 个长列中。 Then you can perform a groupby operation to get the total number of runs per team.然后您可以执行 groupby 操作以获取每个团队的总运行次数。

away = df[["Away", "Away runs"]]
home = df[["Home", "Home runs"]]

new_df = pd.DataFrame(np.vstack([home, away]), columns=["team", "runs"])

print(new_df)
   team runs
0   FW1    2
1   WS3   18
2   RC1    4
3   CH1   14
4   TRO    5
5   CL1   10
6   CL1    8
7   FW1    5
8   FW1   12
9   BS1   14
10  CH1   14
11  CL1    0
12  BS1   20
13  CL1   12
14  CL1   12
15  BS1    9
16  CH1   18
17  WS3   12
18  CH1   14
19  WS3    6
20  TRO   29
21  WS3    4

Now that the teams are all in a single column, and their corresponding runs in the other.现在团队都在一个列中,而他们相应的运行在另一个列中。 We can perform a simple groupby operation to calculate the total number of runs per team, ignoring whether they were "home" or "away"我们可以执行一个简单的groupby操作来计算每支球队的总跑步次数,忽略他们是"home"还是"away"

team_runs = new_df.groupby("team", as_index=False)["runs"].sum()

print(team_runs)
  team  runs
0  BS1    43
1  CH1    60
2  CL1    42
3  FW1    19
4  RC1     4
5  TRO    34
6  WS3    40

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一个列值对两个数据框列求和 - How to sum two dataframe columns based on another column value 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? 根据熊猫数据框中的另一列值计算值的总和? - Calculate the sum of values based on another column value in pandas dataframe? 如何将一列添加到基于另一个列值的数据框中? - How can I add a column to a dataframe that is based on another columns value? 如何基于另一列的滚动总和获得列值? - How do I get a column value based on rolling sum of another column? 基于 Pandas DataFrame 中另一列的 Sum 列 - Sum column based on another column in Pandas DataFrame 如何使用 pandas dataframe 将列添加到 dataframe 根据另一个 df 中的匹配列将数据标记为 1 或 0 - How to use pandas dataframe to add a column to a dataframe that labels data as 1 or 0 based on matching columns in another df 根据熊猫数据框中另一列的最后一个值填充列 - Fill columns based on the last value of another column in a pandas dataframe 如何在第一个数据帧中的另一个基于数据帧的列值中获取值的总和? - How get sum of values in another dataframe based column value in first dataframe? 获取基于另一列的列值,其中包含pandas dataframe中的字符串列表 - get column value based on another column with list of strings in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM