Pandas数据框-基于组的每一列的总和

Question

Panadas - sum of each column based on group by first column Panadas-每列的总和基于第一列

I have this text file which has Table and other 3 other columns indicating Select, Update and Insert. 我有这个文本文件，其中包含表和其他3个列，分别指示“选择”，“更新”和“插入”。 I would like to do group by table and sum of each column and grand total at the end. 我想按表格进行分组，并在末尾对各列的总和进行总计。

df=data.groupby(['Table'])
print df.groupby(['Table'])["Select","Update","Insert"].agg('sum')

Text file has data in this format
Table Select Update Insert
A        10      8      5
B        12      2      0
C        10      2      4
B        19      3      1
D        13      0      5
A        11      7      3

Expected output
Table Select Update Insert
A        21      15     8
B        31      5      1
C        10      2      4
D        13      0      5
Total    75      22    18

df.groupby with sum isn't aggregating data properly for every column. 带有sum的df.groupby不能正确汇总每一列的数据。 If aggregation is done only on one column then it is good but output on my terminal is all messed up. 如果聚合仅在一个列上完成，那很好，但是我终端上的输出被弄乱了。

Appreciate your help! 感谢您的帮助！

Answer 1

You can try: df.groupby(by='Table').sum() for the aggregate table: 您可以尝试： df.groupby(by='Table').sum()获取汇总表：

       Select  Update  Insert
Table                        
A          21      15       8
B          31       5       1
C          10       2       4
D          13       0       5

And df.groupby(by='Table').sum().sum() for the totals: 和df.groupby(by='Table').sum().sum()得出总计：

Select    75
Update    22
Insert    18
dtype: int64

Answer 2

you can try using pandas "pivot_table" function with margins =True 您可以尝试使用带有边距= True的熊猫“ pivot_table”函数

data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

df =pd.DataFrame(data)

df2 =df.pivot_table(index ='Table',
               margins=True,
               margins_name='Total', # defaults to 'All'
               aggfunc=sum)

df2.reset_index(inplace =True)

df2[['Table','Select','Update','Insert']]

And you will get the required output : 然后您将获得所需的输出：

   Table  Select  Update  Insert
0      A      21      15       8
1      B      31       5       1
2      C      10       2       4
3      D      13       0       5
4  Total      75      22      18

Hope this helps! 希望这可以帮助！

Answer 3

Table                               ...        
A        10      8      5      0.0  ...     0.0
A        11      7      3      0.0  ...     0.0
B        12      2      0      0.0  ...     0.0
B        19      3      1      0.0  ...     0.0
C        10      2      4      0.0  ...     0.0
D        13      0      5      0.0  ...     0.0
Table Select Update Insert     0.0  ...     0.0

[7 rows x 3 columns]

This is the output I get with df.groupby(by='Table').sum() 这是我用df.groupby(by='Table').sum()获得的输出

Answer 4

It appears that when loading data from .log file data isn't framed correctly for pandas' to process 似乎从.log文件加载数据时，数据格式不正确，无法供熊猫处理

This is how the data is being loaded 这就是数据加载的方式


df=pd.DataFrame(data)
print df

Output of frame I get,

                        Table  ...  Insert
0  Table Select Update Insert  ...     NaN
1   A        10      8      5  ...     NaN
2   B        12      2      0  ...     NaN
3   C        10      2      4  ...     NaN
4   B        19      3      1  ...     NaN
5   D        13      0      5  ...     NaN
6   A        11      7      3  ...     NaN

versus  
when I load in data frame using below,
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}

output of print df is 
{'Table': ['A', 'B', 'C', 'B', 'D', 'A'], 'Update': [8, 2, 2, 3, 0, 7], 'Select': [10, 12, 10, 19, 13, 11], 'Insert': [5, 0, 4, 1, 5, 3]}

and pivot_table provides the output as expected.

jitesh singla: If you don't mind, can you please provide details on how pivot_table is working with group by on Table column and aggregating data for other columns. jitesh singla：如果您不介意，能否请您在Table列上提供有关ivot_table如何与group by一起工作以及如何汇总其他列的数据的详细信息。

Pandas数据框-基于组的每一列的总和

问题描述

4 个解决方案

解决方案1
1 2019-06-30 05:49:38

解决方案2
1 2019-06-30 07:17:53

解决方案3
0 2019-06-30 06:01:24

解决方案4
0 2019-07-01 15:37:42

Pandas数据框-基于组的每一列的总和

问题描述

4 个解决方案

解决方案1 1 2019-06-30 05:49:38

解决方案2 1 2019-06-30 07:17:53

解决方案3 0 2019-06-30 06:01:24

解决方案4 0 2019-07-01 15:37:42

解决方案1
1 2019-06-30 05:49:38

解决方案2
1 2019-06-30 07:17:53

解决方案3
0 2019-06-30 06:01:24

解决方案4
0 2019-07-01 15:37:42