简体   繁体   English

Python,计算另一列中值的出现频率

[英]Python, count frequency of occurrence for value in another column

So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls.所以我一直在寻找类似问题的解决方案的 stackoverflow 并不断地撞墙。 I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately.我是 python 的新手,并且使用 Pandas/python 进行 ETL,所以如果我没有充分描述我的情况,请原谅我。

I have two dataframes df1 looks like:我有两个数据帧 df1 看起来像:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'abc'           '12'    '23'
4     'abc'           '12'    '23'
5     'cde'           '78'    '90'
6     'bcd'           '45'    '56'

df2 looks like: df2 看起来像:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'cde'           '78'    '90'

I am trying to return a count the number of times SubscriberKey: 'abc' occurs in the dataframe.我正在尝试返回 SubscriberKey: 'abc' 在数据帧中出现的次数。 After finding the values, I would like to append the count to another dataframe (df2) which is my first dataframe deduplicated.找到值后,我想将计数附加到另一个数据帧(df2),这是我第一个重复数据删除的数据帧。

It would look like this:它看起来像这样:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '3'
2     'bcd'           '45'    '56'           '1'
3     'cde'           '78'    '90'           '1'

So what I did was try use this line:所以我所做的是尝试使用这一行:

    df1.groupby(['SubscriberKey']).size()

The reason I only used 'SubscriberKey' was because some rows only had that column filled out with 'OtherID' and 'AnotherID' blank.我只使用 'SubscriberKey' 的原因是因为有些行只在该列中填写了 'OtherID' 和 'AnotherID' 空白。

I have also tried Series.value_count().我也试过 Series.value_count()。 When I try using groupby and size() and set the value of df2['Total Instances'] to the count of occurrences, it appears that the values do not line up correctly.当我尝试使用 groupby 和 size() 并将 df2['Total Instances'] 的值设置为出现次数时,这些值似乎没有正确排列。

For example new table looks like this:例如新表如下所示:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '1'
2     'bcd'           '45'    '56'           '3'
3     'cde'           '78'    '90'           '2'

So my original thought was maybe when doing groupby, the function sorts my output automatically.所以我最初的想法可能是在进行 groupby 时,该功能会自动对我的输出进行排序。 I tried to check by saving the groupby'd table as a csv and realized it only prints out the count column and not the associated subscriberkey column with it.我试图通过将 groupby 的表保存为 csv 进行检查,并意识到它只打印出计数列,而不是与其关联的订阅者密钥列。

Anyhow, does anybody have any input as to how I can achieve this?无论如何,有人对我如何实现这一目标有任何意见吗? To reiterate, I wanted to essentially just add a column to df2 that returns total # of occurrences or instances within df1.重申一下,我本质上只是想向 df2 添加一列,该列返回 df1 中出现或实例的总数。

Thanks!谢谢!

你可以试试:

df2['Total Instances'] = df2['Subscriber Key'].map(df1['Subscriber Key'].value_counts())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 统计数据帧列中某个值出现的次数 - Python count number of occurrence of a value in a dataframe column 计算pandas列中值的频率,其中另一列中的值相似 - Count frequency of value in pandas column where values in another column are similar 如何计算基于另一列的数据框列中值的频率? - How to count frequency of a value in a column of a data frame based on another column? Python Pandas DF-组列,另一列具有相应的频率计数 - Python Pandas DF - Group column with corresponding frequency count of another column 在 python 中计算一列中项目相对于另一列中标准的频率 - count frequency of items in one column in relation to criteria in another column in python 根据其他列中的值计算该列中项目的出现-Python - Count occurrence of items in column depending on value in other column - Python 计算python中某个值相对于另一个值的重复出现次数 - Count re-occurrence of a value in python aggregated with respect to another value 使用python计算网页上特定单词的出现频率 - count the frequency of occurrence of a specific word on a webpage using python 根据另一列计算值的出现次数 - Count the number of Occurrence of Values based on another column 根据另一列计算一列的出现次数 - Count the occurrence of one column based on another
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM