简体   繁体   English

Pandas数据框-基于组的每一列的总和

[英]Pandas dataframe - sum of each column based on group

Panadas - sum of each column based on group by first column Panadas-每列的总和基于第一列

I have this text file which has Table and other 3 other columns indicating Select, Update and Insert. 我有这个文本文件,其中包含表和其他3个列,分别指示“选择”,“更新”和“插入”。 I would like to do group by table and sum of each column and grand total at the end. 我想按表格进行分组,并在末尾对各列的总和进行总计。

df=data.groupby(['Table'])
print df.groupby(['Table'])["Select","Update","Insert"].agg('sum')

Text file has data in this format
Table Select Update Insert
A        10      8      5
B        12      2      0
C        10      2      4
B        19      3      1
D        13      0      5
A        11      7      3

Expected output
Table Select Update Insert
A        21      15     8
B        31      5      1
C        10      2      4
D        13      0      5
Total    75      22    18

df.groupby with sum isn't aggregating data properly for every column. 带有sum的df.groupby不能正确汇总每一列的数据。 If aggregation is done only on one column then it is good but output on my terminal is all messed up. 如果聚合仅在一个列上完成,那很好,但是我终端上的输出被弄乱了。

Appreciate your help! 感谢您的帮助!

You can try: df.groupby(by='Table').sum() for the aggregate table: 您可以尝试: df.groupby(by='Table').sum()获取汇总表:

       Select  Update  Insert
Table                        
A          21      15       8
B          31       5       1
C          10       2       4
D          13       0       5

And df.groupby(by='Table').sum().sum() for the totals: df.groupby(by='Table').sum().sum()得出总计:

Select    75
Update    22
Insert    18
dtype: int64

you can try using pandas "pivot_table" function with margins =True 您可以尝试使用带有边距= True的熊猫“ pivot_table”函数

data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

df =pd.DataFrame(data)

df2 =df.pivot_table(index ='Table',
               margins=True,
               margins_name='Total', # defaults to 'All'
               aggfunc=sum)

df2.reset_index(inplace =True)

df2[['Table','Select','Update','Insert']]

And you will get the required output : 然后您将获得所需的输出:

   Table  Select  Update  Insert
0      A      21      15       8
1      B      31       5       1
2      C      10       2       4
3      D      13       0       5
4  Total      75      22      18

Hope this helps! 希望这可以帮助!

Table                               ...        
A        10      8      5      0.0  ...     0.0
A        11      7      3      0.0  ...     0.0
B        12      2      0      0.0  ...     0.0
B        19      3      1      0.0  ...     0.0
C        10      2      4      0.0  ...     0.0
D        13      0      5      0.0  ...     0.0
Table Select Update Insert     0.0  ...     0.0

[7 rows x 3 columns]

This is the output I get with df.groupby(by='Table').sum() 这是我用df.groupby(by='Table').sum()获得的输出

It appears that when loading data from .log file data isn't framed correctly for pandas' to process 似乎从.log文件加载数据时,数据格式不正确,无法供熊猫处理

This is how the data is being loaded 这就是数据加载的方式


df=pd.DataFrame(data)
print df

Output of frame I get,

                        Table  ...  Insert
0  Table Select Update Insert  ...     NaN
1   A        10      8      5  ...     NaN
2   B        12      2      0  ...     NaN
3   C        10      2      4  ...     NaN
4   B        19      3      1  ...     NaN
5   D        13      0      5  ...     NaN
6   A        11      7      3  ...     NaN

versus  
when I load in data frame using below,
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

output of print df is 
{'Table': ['A', 'B', 'C', 'B', 'D', 'A'], 'Update': [8, 2, 2, 3, 0, 7], 'Select': [10, 12, 10, 19, 13, 11], 'Insert': [5, 0, 4, 1, 5, 3]}

and pivot_table provides the output as expected.

jitesh singla: If you don't mind, can you please provide details on how pivot_table is working with group by on Table column and aggregating data for other columns. jitesh singla:如果您不介意,能否请您在Table列上提供有关ivot_table如何与group by一起工作以及如何汇总其他列的数据的详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于 Pandas DataFrame 中另一列的 Sum 列 - Sum column based on another column in Pandas DataFrame Pandas,Dataframe,每行的条件总和 - Pandas, Dataframe, conditional sum of column for each row 按计数和总和分组,基于pandas数据框中的特定列以及其他列 - group by count and sum based on particular column in pandas dataframe in separate column along with other columns 基于列标题的 Pandas Dataframe 总和行 - Pandas Dataframe sum row based on column header 如何将一个熊猫数据框的一列与另一个数据框的每一列相加? - How to sum a column of one pandas dataframe to each column of another dataframe? 使用 Pandas,根据第二列的最小值从数据框中的一列(对于每组)获取值 - With Pandas, get value from one column in dataframe (for each group), based on minimum value of second column Python Pandas Group Dataframe按列/ Sum Integer列按String列 - Python Pandas Group Dataframe by Column / Sum Integer Column by String Column Pandas:基于不同数据帧中的组的一个数据帧中的值的总和 - Pandas: sum of values in one dataframe based on the group in a different dataframe 根据pandas数据帧中的列标签对数据进行分组 - Group data based on column label in pandas dataframe pandas dataframe 中的每一行根据列表列中的多行计算总和 - Calculate sum based on multiple rows from list column for each row in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM