[英]Pandas dataframe - sum of each column based on group
Panadas - sum of each column based on group by first column Panadas-每列的总和基于第一列
I have this text file which has Table and other 3 other columns indicating Select, Update and Insert. 我有这个文本文件,其中包含表和其他3个列,分别指示“选择”,“更新”和“插入”。 I would like to do group by table and sum of each column and grand total at the end.
我想按表格进行分组,并在末尾对各列的总和进行总计。
df=data.groupby(['Table'])
print df.groupby(['Table'])["Select","Update","Insert"].agg('sum')
Text file has data in this format
Table Select Update Insert
A 10 8 5
B 12 2 0
C 10 2 4
B 19 3 1
D 13 0 5
A 11 7 3
Expected output
Table Select Update Insert
A 21 15 8
B 31 5 1
C 10 2 4
D 13 0 5
Total 75 22 18
df.groupby with sum isn't aggregating data properly for every column. 带有sum的df.groupby不能正确汇总每一列的数据。 If aggregation is done only on one column then it is good but output on my terminal is all messed up.
如果聚合仅在一个列上完成,那很好,但是我终端上的输出被弄乱了。
Appreciate your help! 感谢您的帮助!
You can try: df.groupby(by='Table').sum()
for the aggregate table: 您可以尝试:
df.groupby(by='Table').sum()
获取汇总表:
Select Update Insert
Table
A 21 15 8
B 31 5 1
C 10 2 4
D 13 0 5
And df.groupby(by='Table').sum().sum()
for the totals: 和
df.groupby(by='Table').sum().sum()
得出总计:
Select 75
Update 22
Insert 18
dtype: int64
you can try using pandas "pivot_table" function with margins =True 您可以尝试使用带有边距= True的熊猫“ pivot_table”函数
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}
df =pd.DataFrame(data)
df2 =df.pivot_table(index ='Table',
margins=True,
margins_name='Total', # defaults to 'All'
aggfunc=sum)
df2.reset_index(inplace =True)
df2[['Table','Select','Update','Insert']]
And you will get the required output : 然后您将获得所需的输出:
Table Select Update Insert
0 A 21 15 8
1 B 31 5 1
2 C 10 2 4
3 D 13 0 5
4 Total 75 22 18
Hope this helps! 希望这可以帮助!
Table ...
A 10 8 5 0.0 ... 0.0
A 11 7 3 0.0 ... 0.0
B 12 2 0 0.0 ... 0.0
B 19 3 1 0.0 ... 0.0
C 10 2 4 0.0 ... 0.0
D 13 0 5 0.0 ... 0.0
Table Select Update Insert 0.0 ... 0.0
[7 rows x 3 columns]
This is the output I get with df.groupby(by='Table').sum()
这是我用
df.groupby(by='Table').sum()
获得的输出
It appears that when loading data from .log file data isn't framed correctly for pandas' to process 似乎从.log文件加载数据时,数据格式不正确,无法供熊猫处理
This is how the data is being loaded 这就是数据加载的方式
df=pd.DataFrame(data)
print df
Output of frame I get,
Table ... Insert
0 Table Select Update Insert ... NaN
1 A 10 8 5 ... NaN
2 B 12 2 0 ... NaN
3 C 10 2 4 ... NaN
4 B 19 3 1 ... NaN
5 D 13 0 5 ... NaN
6 A 11 7 3 ... NaN
versus
when I load in data frame using below,
data={'Table':['A','B','C','B','D','A'],'Select':[10,12,10,19,13,11],'Update':[8,2,2,3,0,7],'Insert':[5,0,4,1,5,3]}
output of print df is
{'Table': ['A', 'B', 'C', 'B', 'D', 'A'], 'Update': [8, 2, 2, 3, 0, 7], 'Select': [10, 12, 10, 19, 13, 11], 'Insert': [5, 0, 4, 1, 5, 3]}
and pivot_table provides the output as expected.
jitesh singla: If you don't mind, can you please provide details on how pivot_table is working with group by on Table column and aggregating data for other columns. jitesh singla:如果您不介意,能否请您在Table列上提供有关ivot_table如何与group by一起工作以及如何汇总其他列的数据的详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.