简体   繁体   English

如何在Python中为数据帧中的记录分配唯一值计数

[英]How to assign count of unique values to the records in a data frame in python

I have a data frame like this: 我有一个像这样的数据框:

IP_address
   IP1
   IP1
   IP1
   IP4
   IP4
   IP4
   IP4
   IP4
   IP7
   IP7
   IP7

I would like to take count of unique values in this column and add the count as a variable by itself. 我想在此列中对唯一值进行计数,并将计数本身添加为变量。 At the end, it should look like this: 最后,它应如下所示:

IP_address  IP_address_Count
   IP1               3
   IP1               3
   IP1               3
   IP4               5
   IP4               5
   IP4               5
   IP4               5
   IP4               5
   IP7               3
   IP7               3
   IP7               3

I am able to take the unique values of the column using the below code: 我可以使用以下代码获取列的唯一值:

unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()

However, I am not sure how to match these in a loop in python so that i can get the desired results in python. 但是,我不确定如何在python中的循环中匹配它们,以便我可以在python中获得所需的结果。 Any sort of help is much appreciated. 任何帮助都将不胜感激。

I am not able to find a equivalent answer in stackoverflow. 我无法在stackoverflow中找到等效的答案。 If there is anything please direct me there. 如果有什么请直接把我带到那里。 Thank you. 谢谢。

You can use value_counts() with map 您可以在地图上使用value_counts()

df['count'] = df['IP_address'].map(df['IP_address'].value_counts())


    IP_address  count
0   IP1         3
1   IP1         3
2   IP1         3
3   IP4         5
4   IP4         5
5   IP4         5
6   IP4         5
7   IP4         5
8   IP7         3
9   IP7         3
10  IP7         3

Using pd.factorize 使用pd.factorize
This should be a very fast solution that scales well for large data 这应该是一个非常快速的解决方案,可以很好地扩展大数据

f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])

   IP_address  IP_address_Count
0         IP1                 3
1         IP1                 3
2         IP1                 3
3         IP4                 5
4         IP4                 5
5         IP4                 5
6         IP4                 5
7         IP4                 5
8         IP7                 3
9         IP7                 3
10        IP7                 3

NumPy way - NumPy方式-

tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]

Sample output - 样本输出-

In [275]: df
Out[275]: 
   IP_address  IP_address_Count
0         IP1                 3
1         IP1                 3
2         IP1                 3
3         IP4                 5
4         IP4                 5
5         IP4                 5
6         IP4                 5
7         IP4                 5
8         IP7                 3
9         IP7                 3
10        IP7                 3
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')

In [76]: df
Out[76]:
   IP_address  IP_address_Count
0         IP1                 3
1         IP1                 3
2         IP1                 3
3         IP4                 5
4         IP4                 5
5         IP4                 5
6         IP4                 5
7         IP4                 5
8         IP7                 3
9         IP7                 3
10        IP7                 3
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
    dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]

This seems to give me the sort of output that you desire 这似乎给了我您想要的输出

EDIT: Vaishali's use of map is perfect 编辑:Vaishali对地图的使用是完美的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Count 组内数据框中唯一值的数量 - Python Count Number of Unique Values within Data frame within a group 在 Python 中并排显示数据框的唯一值和计数 - Display unique values & count of a data-frame side by side in Python python,计算数据帧内列表的唯一列表值 - python, count unique list values of a list inside a data frame 在 Python 中,我们如何根据某些标准(即毛利率和成本)将一个数据框的前 20 个唯一值分配给另一个数据框? - How in Python we can assign top 20 unique values from one Data Frame to another Data Frame based certain criteria aka their Gross Margin and Cost? 如何在python中从另一个数据帧为新数据帧赋值 - how to assign values to a new data frame from another data frame in python python - 如何在python pandas中分组并取一列的计数除以数据框第二列的唯一计数? - How to do group by and take Count of one column divide by count of unique of second column of data frame in python pandas? 具有其他数据框(pandas,python)中唯一值的数据框 - Data frame with unique values from other data frame(pandas, python) 如何为 Python 中的唯一值分配值 - How to Assign value for unique values in Python 如何在熊猫数据框中为切片分配值 - How to assign values to a slice in a pandas data frame 如何在数据框中随机分配值 - How to randomly assign values across a data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM