[英]Python - Group / count a list of numbers by custom ranges
Good evening,晚上好,
I hope that you are well.我希望你好好的。
I have a list of numbers which I would like to group into "bins" or "buckets" based on ranges that I define.我有一个数字列表,我想根据我定义的范围将其分组到“箱”或“桶”中。 I would like the output to display each group "name" and the total number of values that fall within that range.我希望输出显示每个组“名称”以及该范围内的值总数。
For example :-例如 :-
my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004] my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
Example criteria示例标准
I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.我显然可以通过编写多个条件 if 语句以非常粗略的方式实现这一点,但我知道必须有一种更优雅的方式来实现结果。
Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.各种搜索表明我可以使用 pandas.cut / digitize 来实现这一点,但是到目前为止,我未能成功实现所需的输出。
Any assistance would be much appreciated.任何帮助将不胜感激。
Many thanks非常感谢
James詹姆士
you are right, you can use pd.cut
combined with a groupby to achieve what you want.你是对的,你可以使用pd.cut
结合 groupby 来实现你想要的。
Step 1: Define data第 1 步:定义数据
import pandas as pd
import numpy as np
my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]
df = pd.DataFrame(my_list, columns=['data'])
cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf])
labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000']
Step 2: Generate the category name using pd.cut, and set index for groupby later第二步:使用pd.cut生成类别名称,稍后为groupby设置索引
df['category'] = pd.cut(df['data'], cut_edges, labels=labels)
df.set_index('category', append=False, inplace=True)
Step 3: groupby to do the count第 3 步:groupby 进行计数
df.groupby(level='category').count()
Output:输出:
EDIT编辑
As pointed out in the comments numpy.histogram
is another possibly more concise approach which will work.正如评论中指出的numpy.histogram
是另一种可能更简洁的方法。 This answer used pd.cut
as it was specifically mentioned in the question.这个答案使用了问题中特别提到的pd.cut
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.