简体   繁体   English

Python - 按自定义范围对数字列表进行分组/计数

[英]Python - Group / count a list of numbers by custom ranges

Good evening,晚上好,

I hope that you are well.我希望你好好的。

I have a list of numbers which I would like to group into "bins" or "buckets" based on ranges that I define.我有一个数字列表,我想根据我定义的范围将其分组到“箱”或“桶”中。 I would like the output to display each group "name" and the total number of values that fall within that range.我希望输出显示每个组“名称”以及该范围内的值总数。

For example :-例如 :-

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004] my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]

Example criteria示例标准

  • greater than 250,000 (output: 1)大于 250,000(输出:1)
  • greater than 100,000 but less than or equal to 250,000 (output: 1)大于 100,000 但小于或等于 250,000(输出:1)
  • greater than 10,000 but less than or equal to 100,000 (output: 0)大于 10,000 但小于或等于 100,000(输出:0)
  • greater than 1,000 but less than or equal to 10,000 (output: 1)大于 1,000 但小于或等于 10,000(输出:1)
  • greater than 100 but less than or equal to 1000 (output: 2)大于 100 但小于或等于 1000(输出:2)
  • less than 100 (output: 3)小于 100(输出:3)

I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.我显然可以通过编写多个条件 if 语句以非常粗略的方式实现这一点,但我知道必须有一种更优雅的方式来实现结果。

Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.各种搜索表明我可以使用 pandas.cut / digitize 来实现这一点,但是到目前为止,我未能成功实现所需的输出。

Any assistance would be much appreciated.任何帮助将不胜感激。

Many thanks非常感谢

James詹姆士

you are right, you can use pd.cut combined with a groupby to achieve what you want.你是对的,你可以使用pd.cut结合 groupby 来实现你想要的。

Step 1: Define data第 1 步:定义数据

import pandas as pd
import numpy as np

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
df = pd.DataFrame(my_list, columns=['data'])
cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf])
labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000']

Step 2: Generate the category name using pd.cut, and set index for groupby later第二步:使用pd.cut生成类别名称,稍后为groupby设置索引

df['category'] = pd.cut(df['data'], cut_edges, labels=labels)
df.set_index('category', append=False, inplace=True)

Step 3: groupby to do the count第 3 步:groupby 进行计数

df.groupby(level='category').count()

Output:输出:

在此处输入图像描述

EDIT编辑

As pointed out in the comments numpy.histogram is another possibly more concise approach which will work.正如评论中指出的numpy.histogram是另一种可能更简洁的方法。 This answer used pd.cut as it was specifically mentioned in the question.这个答案使用了问题中特别提到的pd.cut

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM