Python - 按自定义范围对数字列表进行分组/计数

Question

Good evening,晚上好，

I hope that you are well.我希望你好好的。

I have a list of numbers which I would like to group into "bins" or "buckets" based on ranges that I define.我有一个数字列表，我想根据我定义的范围将其分组到“箱”或“桶”中。 I would like the output to display each group "name" and the total number of values that fall within that range.我希望输出显示每个组“名称”以及该范围内的值总数。

For example :-例如：-

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004] my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]

Example criteria示例标准

greater than 250,000 (output: 1)大于 250,000（输出：1）
greater than 100,000 but less than or equal to 250,000 (output: 1)大于 100,000 但小于或等于 250,000（输出：1）
greater than 10,000 but less than or equal to 100,000 (output: 0)大于 10,000 但小于或等于 100,000（输出：0）
greater than 1,000 but less than or equal to 10,000 (output: 1)大于 1,000 但小于或等于 10,000（输出：1）
greater than 100 but less than or equal to 1000 (output: 2)大于 100 但小于或等于 1000（输出：2）
less than 100 (output: 3)小于 100（输出：3）

I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.我显然可以通过编写多个条件 if 语句以非常粗略的方式实现这一点，但我知道必须有一种更优雅的方式来实现结果。

Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.各种搜索表明我可以使用 pandas.cut / digitize 来实现这一点，但是到目前为止，我未能成功实现所需的输出。

Any assistance would be much appreciated.任何帮助将不胜感激。

Many thanks非常感谢

James詹姆士

Answer 1

you are right, you can use pd.cut combined with a groupby to achieve what you want.你是对的，你可以使用pd.cut结合 groupby 来实现你想要的。

Step 1: Define data第 1 步：定义数据

import pandas as pd
import numpy as np

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
df = pd.DataFrame(my_list, columns=['data'])
cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf])
labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000']

Step 2: Generate the category name using pd.cut, and set index for groupby later第二步：使用pd.cut生成类别名称，稍后为groupby设置索引

df['category'] = pd.cut(df['data'], cut_edges, labels=labels)
df.set_index('category', append=False, inplace=True)

Step 3: groupby to do the count第 3 步：groupby 进行计数

df.groupby(level='category').count()

Output:输出：

EDIT编辑

As pointed out in the comments numpy.histogram is another possibly more concise approach which will work.正如评论中指出的numpy.histogram是另一种可能更简洁的方法。 This answer used pd.cut as it was specifically mentioned in the question.这个答案使用了问题中特别提到的pd.cut 。

Python - 按自定义范围对数字列表进行分组/计数

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-24 01:42:21

Python - 按自定义范围对数字列表进行分组/计数

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-24 01:42:21

解决方案1
0 已采纳 2022-05-24 01:42:21