[英]How to assign count of unique values to the records in a data frame in python
I have a data frame like this: 我有一个像这样的数据框:
IP_address
IP1
IP1
IP1
IP4
IP4
IP4
IP4
IP4
IP7
IP7
IP7
I would like to take count of unique values in this column and add the count as a variable by itself. 我想在此列中对唯一值进行计数,并将计数本身添加为变量。 At the end, it should look like this:
最后,它应如下所示:
IP_address IP_address_Count
IP1 3
IP1 3
IP1 3
IP4 5
IP4 5
IP4 5
IP4 5
IP4 5
IP7 3
IP7 3
IP7 3
I am able to take the unique values of the column using the below code: 我可以使用以下代码获取列的唯一值:
unique_ip_address_count = (df_c_train.drop_duplicates().IP_address.value_counts()).to_dict()
However, I am not sure how to match these in a loop in python so that i can get the desired results in python. 但是,我不确定如何在python中的循环中匹配它们,以便我可以在python中获得所需的结果。 Any sort of help is much appreciated.
任何帮助都将不胜感激。
I am not able to find a equivalent answer in stackoverflow. 我无法在stackoverflow中找到等效的答案。 If there is anything please direct me there.
如果有什么请直接把我带到那里。 Thank you.
谢谢。
You can use value_counts() with map 您可以在地图上使用value_counts()
df['count'] = df['IP_address'].map(df['IP_address'].value_counts())
IP_address count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
Using pd.factorize
使用
pd.factorize
This should be a very fast solution that scales well for large data 这应该是一个非常快速的解决方案,可以很好地扩展大数据
f, u = pd.factorize(df.IP_address.values)
df.assign(IP_address_Count=np.bincount(f)[f])
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
NumPy way - NumPy方式-
tags, C = np.unique(df.IP_address, return_counts=1, return_inverse=1)[1:]
df['IP_address_Count'] = C[tags]
Sample output - 样本输出-
In [275]: df
Out[275]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
In [75]: df['IP_address_Count'] = df.groupby('IP_address')['IP_address'].transform('size')
In [76]: df
Out[76]:
IP_address IP_address_Count
0 IP1 3
1 IP1 3
2 IP1 3
3 IP4 5
4 IP4 5
5 IP4 5
6 IP4 5
7 IP4 5
8 IP7 3
9 IP7 3
10 IP7 3
ip_set = df.IP_address.unique()
dict_temp = {}
for ip in ip_set:
dict_temp[ip] = df[df.IP_address == ip].IP_address.value_counts()[0]
df['counts'] = [dict_temp[ip] for ip in df.IP_address]
This seems to give me the sort of output that you desire 这似乎给了我您想要的输出
EDIT: Vaishali's use of map is perfect 编辑:Vaishali对地图的使用是完美的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.