简体   繁体   English

Python数据框如何按一列分组并获得其他列的总和

[英]Python dataframe how to group by one column and get sum of other column

I want to create a new data frame which has 2 columns, grouped by Striker_Id and other column which has sum of 'Batsman_Scored' corresponding to the grouped 'Striker_Id'我想创建一个新的数据框,它有 2 列,按Striker_Id和其他列分组, Striker_Id列具有与分组的 'Striker_Id' 相对应的 'Batsman_Scored' 总和

Eg:例如:

Striker_ID  Batsman_Scored
1            0
2            8 
...

在此处输入图片说明

I tried this ball.groupby(['Striker_Id'])['Batsman_Scored'].sum() but this is what I get:我试过这个ball.groupby(['Striker_Id'])['Batsman_Scored'].sum()但这就是我得到的:

Striker_Id
1      0000040141000010111000001000020000004001010001...
2      0000000446404106064011111011100012106110621402...
3      0000121111114060001000101001011010010001041011...
4      0114110102100100011010000000006010011001111101...
5      0140016010010040000101111100101000111410011000...
6      1100100000104141011141001004001211200001110111...

It doesn't sum, only joins all the numbers.它不求和,只连接所有数字。 What's the alternative?什么是替代方案?

For some reason, your columns were loaded as strings.出于某种原因,您的列被加载为字符串。 While loading them from a CSV, try applying a converter -从 CSV 加载它们时,尝试应用转换器 -

df = pd.read_csv('file.csv', converters={'Batsman_Scored' : int})

Or,或者,

df = pd.read_csv('file.csv', converters={'Batsman_Scored' : pd.to_numeric})

If that doesn't work, then convert to integer after loading -如果这不起作用,则在加载后转换为整数 -

df['Batsman_Scored'] = df['Batsman_Scored'].astype(int)

Or,或者,

df['Batsman_Scored'] = pd.to_numeric(df['Batsman_Scored'], errors='coerce')

Now, performing the groupby should work -现在,执行 groupby 应该可以工作 -

r = df.groupby('Striker_Id')['Batsman_Scored'].sum() 

Without access to your data, I can only speculate.无法访问您的数据,我只能推测。 But it seems like, at some point, your data contains non-numeric data that prevents pandas from being able to perform conversions, resulting in those columns being retained as strings.但似乎在某些时候,您的数据包含非数字数据,这些数据会阻止 Pandas 执行转换,导致这些列被保留为字符串。 It's a little difficult to pinpoint this problematic data until you actually load it in and do something like在您实际加载并执行类似操作之前,要查明这些有问题的数据有点困难

df.col.str.isdigit().any()

That'll tell you if there are any non-numeric items.这会告诉您是否有任何非数字项。 Note that it only works for integers, float columns cannot be debugged like this.请注意,它仅适用于整数,不能像这样调试浮点列。

Also, another way of seeing what columns have corrupt data would be to query dtypes -此外,查看哪些列具有损坏数据的另一种方法是查询dtypes -

df.dtypes

Which will give you a listing of all columns and their datatypes.这将为您提供所有列及其数据类型的列表。 Use this to figure out what columns need parsing -使用它来确定哪些列需要解析 -

for c in df.columns[df.dtypes == object]:
    print(c)

You can then apply the methods outlined above to fix them.然后,您可以应用上述方法来修复它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何按组对 dataframe 中的值求和,直到其他列中的某些值? - How to sum values in dataframe until certain values in other column by group? 如何从具有一列的其他行的值之和的DataFrame中获取DataFrame? - How to get a DataFrame from the DataFrame with one column as a sum of values of other rows? 基于一列分组并获得其他列熊猫的唯一性和总和 - Group BY based on one column and get unique and sum of other columns pandas 如何对行进行分组,在一列中计数并在另一列中求和? - How to group rows, count in one column and do the sum in the other? Python Pandas Group Dataframe按列/ Sum Integer列按String列 - Python Pandas Group Dataframe by Column / Sum Integer Column by String Column 如何在Python DataFrame中将一列乘以其他几列 - How to multiply one column to few other multiple column in Python DataFrame DataFrame:按一列分组并平均其他列 - DataFrame: Group by one column and average other columns Dataframe 基于一列分组并获得另一列所需项目的值总和 - Dataframe group based on one column and get the sum of value of desired items for another column 如果在python中将一列作为无列传递,如何对数据帧中的列求和 - how to sum the columns in a dataframe if one column is passed as none in python 如何将行分组并在python中的一列中求和 - how to group the rows and sum the values in one column in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM