Python数据框在两列中统计发生的次数

Question

两个数据框列：

data['IP']          data['domain']
10.20.30.40         example.org 
10.20.30.40         example.org
10.20.30.40         example.org
10.20.30.40         example.org
1.2.3.4             google.com
1.2.3.4             google.com
1.2.3.4             google.com
200.100.200.100     yahoo.com
200.100.200.100     yahoo.com
9.8.7.6             random.com

我想找到一种有效的方法来计算每个域映射到同一IP地址的次数。 然后，如果出现的次数大于二（2），则取特定的域（仅保留唯一值），然后将其移至另一个数据框或列。

所以输出可能是这样的：

[Occurences]    [To be processed]
4               example.org
4               google.com
4
4
3               
3
3

我尝试了不同的方法，例如图，然后取节点的度数，并使用数据透视表来表示数字，但是我想要一种有效的方法，该方法将允许我在if then> 2语句之后继续进行域的处理。。

所有这些都应使用python panda数据帧实现！

Answer 1

下面的代码在“域”上执行groupby ，然后在“ IP”地址上调用value_counts ，然后我们对此进行过滤，重置索引并重命名列，这样它们就更有意义了：

In [58]:
gp = df.groupby('domain')['IP'].value_counts()
df1 = gp[gp > 2].reset_index()
df1.rename(columns={'level_1': 'IP', 0:'Occurences'}, inplace=True)
df1

Out[58]:
        domain           IP  Occurences
0  example.org  10.20.30.40           4
1   google.com      1.2.3.4           3

Python数据框在两列中统计发生的次数

问题描述

1 个解决方案

解决方案1
3 2015-06-23 13:00:51

Python数据框在两列中统计发生的次数

问题描述

1 个解决方案

解决方案1 3 2015-06-23 13:00:51

解决方案1
3 2015-06-23 13:00:51