[英]Python, count frequency of occurrence for value in another column
So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls.所以我一直在寻找类似问题的解决方案的 stackoverflow 并不断地撞墙。 I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately.
我是 python 的新手,并且使用 Pandas/python 进行 ETL,所以如果我没有充分描述我的情况,请原谅我。
I have two dataframes df1 looks like:我有两个数据帧 df1 看起来像:
Subscriber Key OtherID AnotherID
1 'abc' '12' '23'
2 'bcd' '45' '56'
3 'abc' '12' '23'
4 'abc' '12' '23'
5 'cde' '78' '90'
6 'bcd' '45' '56'
df2 looks like: df2 看起来像:
Subscriber Key OtherID AnotherID
1 'abc' '12' '23'
2 'bcd' '45' '56'
3 'cde' '78' '90'
I am trying to return a count the number of times SubscriberKey: 'abc' occurs in the dataframe.我正在尝试返回 SubscriberKey: 'abc' 在数据帧中出现的次数。 After finding the values, I would like to append the count to another dataframe (df2) which is my first dataframe deduplicated.
找到值后,我想将计数附加到另一个数据帧(df2),这是我第一个重复数据删除的数据帧。
It would look like this:它看起来像这样:
Subscriber Key OtherID AnotherID Total Instances
1 'abc' '12' '23' '3'
2 'bcd' '45' '56' '1'
3 'cde' '78' '90' '1'
So what I did was try use this line:所以我所做的是尝试使用这一行:
df1.groupby(['SubscriberKey']).size()
The reason I only used 'SubscriberKey' was because some rows only had that column filled out with 'OtherID' and 'AnotherID' blank.我只使用 'SubscriberKey' 的原因是因为有些行只在该列中填写了 'OtherID' 和 'AnotherID' 空白。
I have also tried Series.value_count().我也试过 Series.value_count()。 When I try using groupby and size() and set the value of df2['Total Instances'] to the count of occurrences, it appears that the values do not line up correctly.
当我尝试使用 groupby 和 size() 并将 df2['Total Instances'] 的值设置为出现次数时,这些值似乎没有正确排列。
For example new table looks like this:例如新表如下所示:
Subscriber Key OtherID AnotherID Total Instances
1 'abc' '12' '23' '1'
2 'bcd' '45' '56' '3'
3 'cde' '78' '90' '2'
So my original thought was maybe when doing groupby, the function sorts my output automatically.所以我最初的想法可能是在进行 groupby 时,该功能会自动对我的输出进行排序。 I tried to check by saving the groupby'd table as a csv and realized it only prints out the count column and not the associated subscriberkey column with it.
我试图通过将 groupby 的表保存为 csv 进行检查,并意识到它只打印出计数列,而不是与其关联的订阅者密钥列。
Anyhow, does anybody have any input as to how I can achieve this?无论如何,有人对我如何实现这一目标有任何意见吗? To reiterate, I wanted to essentially just add a column to df2 that returns total # of occurrences or instances within df1.
重申一下,我本质上只是想向 df2 添加一列,该列返回 df1 中出现或实例的总数。
Thanks!谢谢!
你可以试试:
df2['Total Instances'] = df2['Subscriber Key'].map(df1['Subscriber Key'].value_counts())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.