简体   繁体   English

计数级别在 Python dataframe 中的集群/组内出现的次数

[英]Count number of times a level occurs within a cluster/group in Python dataframe

I have a dataframe with clusters.我有一个带集群的 dataframe。 In this dataframe, I want to count the number of times a particular value occurs inside a cluster.在这个 dataframe 中,我想计算特定值在集群内出现的次数。 For example:例如:

data = {'cluster':['1001', '1001', '1001', '1002', '1002', '1002'],
        'attribute':['1', '2', '1', '1', '2', '2']}

df = pd.DataFrame(data)

df

I want to count how many times '1' has occurred inside each cluster.我想计算每个集群内出现了多少次“1”。 I have tried using lambda functions, and although trying to average inside the cluster works, count is not working.我曾尝试使用 lambda 函数,虽然尝试在集群内进行平均,但计数不起作用。

For averaging, I used:对于平均,我使用:

df['newcol'] = df.groupby('cluster')['attribute'].transform(lambda x: x.mean())
df

Using the same, but with mean replaced with count:使用相同,但均值替换为计数:

df['newcol'] = df.groupby('cluster')['attribute'].transform(lambda x: x.count('2'))
df

Gives me this error:给我这个错误:

Error: 'Requested level (3) does not match index name (None)'错误:“请求的级别 (3) 与索引名称 (None) 不匹配”

I ideally want to add the count as an additional column, hence am using the lambda function.理想情况下,我想将计数添加为附加列,因此我使用 lambda function。

Please help me in solving this, If any additional detail is required or if I was not clear, I'd be happy to add information!请帮我解决这个问题,如果需要任何额外的细节或者我不清楚,我很乐意添加信息!

Edit编辑

Thank you, @Rutger has provided what I was looking for.谢谢,@Rutger 提供了我想要的东西。 In a gist, I was looking to create a new column that would show me how many times the attribute has occurred in a cluster.简而言之,我希望创建一个新列,以显示该属性在集群中出现了多少次。 I also needed it to be generalizable, so that all the attributes could be calculated.我还需要它是可概括的,以便可以计算所有属性。

On a separate note, my dataframe consists of around 600,000 rows.另外,我的 dataframe 包含大约 600,000 行。 Is there a recommended way to perhaps take a chunk out of this dataset so that I could do my work on that?有没有推荐的方法可以从这个数据集中取出一个块,以便我可以做我的工作? If there's a similar answer somewhere else, kindly point me towards the same!如果其他地方有类似的答案,请指出我的相同! Thank you!谢谢!

There are many ways of doing it.有很多方法可以做到这一点。 I would go for a groupby with both columns and then you just see how frequent they occur.我将 go 用于包含两列的 groupby ,然后您就会看到它们发生的频率。 This is not the most straightforward method I assume but I think it's the result you are looking for.这不是我假设的最直接的方法,但我认为这是您正在寻找的结果。

df['count'] = df.set_index(['cluster', 'attribute']).index.map(df.groupby(['cluster', 'attribute']).size())

Since you want to add a column alongside with the existing columns to show the number of 1's in a cluster (group), you can keep on using .transform() as you are doing now.由于您想在现有列旁边添加一列以显示集群(组)中1's数量,因此您可以像现在一样继续使用.transform()

Inside the .transform() , you can use lambda function to check the elements equal '1' and get the sum() (instead of count) of such True entries, as follows:.transform()内部,您可以使用 lambda function 来检查元素是否等于 '1' 并获取此类True条目的sum() (而不是 count),如下所示:

df['newcol'] = df.groupby('cluster')['attribute'].transform(lambda x: x.eq('1').sum())

Result:结果:

print(df)


  cluster attribute   newcol
0    1001         1        2
1    1001         2        2
2    1001         1        2
3    1002         1        1
4    1002         2        1
5    1002         2        1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在大熊猫数据框中,算出某一列中某条件发生的次数? - In a pandas dataframe, count the number of times a condition occurs in one column? 使用递归Python计算项目在序列中出现的次数 - Count the number of times an item occurs in a sequence using recursion Python Python Pandas计算发生特定值的DataFrame列的数量 - Python Pandas Count Number of DataFrame Columns in which a Particular Value Occurs 当组中的特定值至少出现两次时,从 Pandas 数据框中删除组 - Remove group from the pandas dataframe when a specific value within the group occurs at least two times 如何计算子列表中某个特定模式在列表中出现的次数,然后将该计数追加到子列表中? - How to count the number of times a certain pattern in a sublist occurs within a list and then append that count to the sublist? 计算 object 在 DataFrame 列的列表中出现的次数 - Count how many times an object occurs in a list of a list within a DataFrame column 如何分析python数据帧并计算字符串在列中出现的次数? - How to analyze python dataframe and to count how many times a string occurs in a column? Python Count 组内数据框中唯一值的数量 - Python Count Number of Unique Values within Data frame within a group 如何通读python中的文本文件并计算某个字符在其中出现的次数? - How to read through a text file in python and count the number of times a certain character occurs in it? 如何使用 python 计算 json 文件中某个单词出现的次数? - How to count number of times a certain word occurs in json file using python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM