[英]Pandas count all occurrences on different columns in a dataframe
I have a data frame similar to this one GRP HOST1 HOST2 HOST3 FILESIZE 0 0 srv39 srv45 srv47 203498176 1 1 srv102 srv36 srv38 452763956 2 1 srv101 srv36 srv45 453277268 3 1 srv101 srv34 srv45 448174741 4 1 srv36 srv49 srv50 452728577 5 2 srv100 srv47 srv48 454617541 6 2 srv100 srv45 srv49 454617541 7 2 srv38 srv49 srv47 454617541
我有一个与此类似的数据帧
GRP HOST1 HOST2 HOST3 FILESIZE 0 0 srv39 srv45 srv47 203498176 1 1 srv102 srv36 srv38 452763956 2 1 srv101 srv36 srv45 453277268 3 1 srv101 srv34 srv45 448174741 4 1 srv36 srv49 srv50 452728577 5 2 srv100 srv47 srv48 454617541 6 2 srv100 srv45 srv49 454617541 7 2 srv38 srv49 srv47 454617541
Now what I would like to achieve is count all occurrences that I have across HOST1 HOST2 and HOST3 column grouped by the GRP column, like this 现在我想要实现的是计算我在通过GRP列分组的HOST1 HOST2和HOST3列中出现的所有事件,如下所示
-- GRP HOST count 1 srv101 2 srv36 3
It would be perfect if I would be able to sum the value of the FILESIZE column. -- GRP HOST count 1 srv101 2 srv36 3
如果我能够对FILESIZE列的值求和,那将是完美的。 I was trying to shape a solution using suggestions that I have found here , but I have not been able to get the count grouped by GRP. 我试图使用我在这里找到的建议来形成一个解决方案,但是我无法通过GRP对计数进行分组。
Any suggestion about which would be the best approach to obtain the results that I need with pandas? 有关哪种方法可以获得大熊猫需要的最佳方法?
Use melt
for reshape anf then aggregate size
: 使用
melt
重塑和然后聚合size
:
df = (df.melt(id_vars='GRP', value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])
.size()
.reset_index(name='count'))
print (df)
GRP HOST count
0 0 srv39 1
1 0 srv45 1
2 0 srv47 1
3 1 srv101 2
4 1 srv102 1
5 1 srv34 1
6 1 srv36 3
7 1 srv38 1
8 1 srv45 2
9 1 srv49 1
10 1 srv50 1
11 2 srv100 2
12 2 srv38 1
13 2 srv45 1
14 2 srv47 2
15 2 srv48 1
16 2 srv49 2
If want sum
of column FILESIZE
use agg
: 如果想要列的
sum
FILESIZE
使用agg
:
df1 = (df.melt(id_vars=['GRP', 'FILESIZE'], value_vars=['HOST1','HOST2','HOST3'], value_name='HOST')
.groupby(['GRP', 'HOST'])['FILESIZE']
.agg(['size','sum'])
.reset_index()
)
print (df1)
GRP HOST size sum
0 0 srv39 1 203498176
1 0 srv45 1 203498176
2 0 srv47 1 203498176
3 1 srv101 2 901452009
4 1 srv102 1 452763956
5 1 srv34 1 448174741
6 1 srv36 3 1358769801
7 1 srv38 1 452763956
8 1 srv45 2 901452009
9 1 srv49 1 452728577
10 1 srv50 1 452728577
11 2 srv100 2 909235082
12 2 srv38 1 454617541
13 2 srv45 1 454617541
14 2 srv47 2 909235082
15 2 srv48 1 454617541
16 2 srv49 2 909235082
You can using stack
, then follow with groupby
and size
您可以使用
stack
,然后使用groupby
和size
s=df.set_index('GRP')[['HOST1','HOST2','HOST3']].stack().to_frame('HOST')
s.groupby([s.index.get_level_values(level=0),s.HOST]).size()
Out[229]:
GRP HOST
0 srv39 1
srv45 1
srv47 1
1 srv101 2
srv102 1
srv34 1
srv36 3
srv38 1
srv45 2
srv49 1
srv50 1
2 srv100 2
srv38 1
srv45 1
srv47 2
srv48 1
srv49 2
dtype: int64
If you need sum 如果你需要总和
s=df.set_index(['GRP','FILESIZE'])[['HOST1','HOST2','HOST3']].stack().to_frame('HOST').reset_index(level=1)
s.groupby([s.index.get_level_values(level=0),s.HOST.values]).FILESIZE.agg(['count','sum'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.