[英]Pandas group by with sum on few columns and retain the other column
I have a table which look like this. 我有一张看起来像这样的桌子。
msno date num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150513 0 0 0 0 1 1
1 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 20150709 9 1 0 0 7 11
2 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150105 3 3 0 0 68 36
3 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150306 1 0 1 1 97 27
4 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150501 3 0 0 0 38 38
5 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150702 4 0 1 1 33 10
6 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20150830 3 1 0 0 4 7
7 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20151107 1 0 0 0 4 5
8 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160110 2 0 1 0 11 6
9 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160316 9 3 4 1 67 50
10 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160510 5 3 2 1 67 66
11 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160804 1 4 5 0 36 43
12 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20160926 7 1 0 1 38 20
13 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20161115 0 1 4 1 38 40
14 yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8= 20170106 0 0 0 1 39 38
15 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20151201 3 3 2 0 8 11
16 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20160628 0 0 1 1 1 3
17 PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g= 20170106 2 1 0 0 35 34
18 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20150803 0 0 0 0 16 11
19 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160527 4 3 0 2 2 11
20 KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8= 20160808 14 3 4 1 15 31
How should I sum up the columns 'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'
to get the total and left only one unique msno number? 我应该如何对'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'
以得到总数,并且只留下一个唯一的msno编号?
For example, after group all same msno number rows, it will produce result below, discarding date column. 例如,将所有相同的msno数字行分组后,将在下面产生结果,并丢弃日期列。
msno num_25 num_50 num_75 num_985 num_100 num_unq \
0 rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34= 9 1 0 0 8 12
I tried this but the msno still duplicated and date column is still there. 我试过了,但msno仍然重复并且date列仍然存在。
df_user_logs_v2.groupby(['msno', 'date'])['num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'].sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.