简体   繁体   English

如何使用Pandas有效地将值分成重叠的bin?

[英]How do I efficiently bin values into overlapping bins using Pandas?

I would like to bin all the values from a column of type float into bins that are overlapping. 我想将float类型的所有值都绑定到重叠的bin中。 The resulting column could be a series of 1-D vectors with bools - one vector for each value from the original column. 得到的列可以是一系列带有bool的1-D向量 - 来自原始列的每个值的一个向量。 The resulting vectors contain True for each bin a value falls into and False for the other bins. 所产生的载体含有True的值落入每个bin和False的其他段。

For example, if I have four bins [(0, 10), (7, 20), (15, 30), (30, 60)] , and the original value is 9.5, the resulting vector should be [True, True, False, False] . 例如,如果我有四个区间[(0, 10), (7, 20), (15, 30), (30, 60)] 0,10 [(0, 10), (7, 20), (15, 30), (30, 60)] ,并且原始值为9.5,则结果向量应为[True, True, False, False]

I know how to iterate through all the ranges with a custom function using 'apply', but is there a way to perform this binning more efficiently and concisely? 我知道如何使用'apply'使用自定义函数遍历所有范围,但有没有办法更有效,更简洁地执行此分区?

Would a simple list comprehension meet your needs? 简单的列表理解能满足您的需求吗?

Bins = [(0, 10), (7, 20), (15, 30), (30, 60)]
Result = [((9.5>=y[0])&(9.5<=y[1])) for y in Bins]

If your data is stored in column data of a pandas DataFrame ( df ) then you can define the function: 如果您的数据存储在pandas DataFrame( df )的列data ,那么您可以定义该函数:

def in_ranges(x,bins):
    return [((x>=y[0])&(x<=y[1])) for y in bins]

and apply it to the column: 并将其应用于列:

df[data].apply(lambda x: pd.Series(in_ranges(x,Bins),Bins))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在通过 pandas.cut() 函数创建垃圾箱后,如何有效地将每个值标记到垃圾箱? - How to efficiently label each value to a bin after I created the bins by pandas.cut() function? 给定一个字典,其中键是 bin,值是频率,我如何有效地计算 bin 的均值和标准差? - Given a dictionary of dictionaries where keys are bins and values are frequency, how do I efficiently calculate the mean and std of the bins? 如何使用groupby在两个箱中剪切一列并使用bin聚合数据? - How do I cut a column in two bins and aggregate data per bin using groupby? 使用熊猫有效地计算大型数据帧的每个时间仓的值 - Using pandas to count values for each time bin for large dataframe efficiently Pandas垃圾箱的额外垃圾箱== 0 - Pandas bins with additional bin of ==0 如何使用pandas按字母顺序将数据分类? - How do i bin data into categories alphabetically using pandas? 如何在 pandas dataframe 中制作相同数量的观察值? - How do I make bins of equal number of observations in a pandas dataframe? 如何在 pandas 中使用 groupby 按 bin 对数据进行排序? - How can I sort data by bins using groupby in pandas? 如何在 Python 3 中使用带有 x 和 y 值的 mathlibplot.hist? - How do I use mathlibplot.hist with x and y values using bins=40 in Python 3? 如何在pandas中有效地合并列和groupby? - how to bin efficiently a column and groupby in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM