简体繁体 English

对具有低值计数的唯一值进行分组

[英]Grouping unique values with low value counts

原文 2020-08-09 07:27:08 2 1 python

My Data frame contains over 40 unique values for a particular attribute.我的数据框包含特定属性的 40 多个唯一值。 I want to do some visualisation of this data, but fitting in all 40 points is challenging.我想对这些数据进行一些可视化，但要拟合所有 40 个点是具有挑战性的。 Using wine['country'].value_counts() , I can see the frequency of each unique value.使用wine['country'].value_counts() ，我可以看到每个唯一值的频率。

When I go to create, for example, a bar chart, I would like any unique values with value counts less than 100 to be grouped together to create it's own bar in the visualisation (and say call it 'rest' or 'other').例如，当我 go 创建条形图时，我希望将值计数小于 100 的任何唯一值组合在一起，以在可视化中创建它自己的条形图（并称其为“休息”或“其他”） .

Any way of doing this?有什么办法吗？

1 个解决方案

Initiate a variable x = 0 .Iterate through wine['country'].value_counts() using for loop.启动一个变量x = 0 。使用for循环遍历wine['country'].value_counts() 。 Then check if a particular value_counts() is less than 100, if true, then add the value_counts() value for that particular iteration to x .然后检查特定value_counts()是否小于 100，如果为真，则将该特定迭代的value_counts()值添加到x 。 This way you will have the sum of such values whose count is less than 100.这样，您将获得计数小于 100 的此类值的总和。

Now before charting, create a new dataframe having data of country vs value_counts() with only those rows whose value_counts() value is greater than 100. Then manually add another row named 'other' to this new dataframe with its value_counts() as x .现在在制图之前，创建一个新的 dataframe，其中包含country与value_counts()的数据，其中只有那些value_counts()值大于 100 的行。然后手动将名为'other'另一行添加到这个新的x中，其value_counts()为. Use this new dataframe for charting.使用这个新的 dataframe 绘制图表。