简体   繁体   English

Python实时数据分箱

[英]Python Real Time Binning of data

I am totally lost in terms of efficiently binning data in real time. 在实时高效地分箱数据方面,我完全迷失了。 What i am trying to do is assign a given value in to a dictionary(or some other structure if there is one thats more efficient). 我想做的是将给定值分配给字典(或其他结构,如果有的话更有效)。

For example, if I know that the date ranges between 0 - 100 (or some other customized bounds) and I have ten bins so bin 1 includes 0 to 10, etc, what would be the best implementation so that I can simply drop the value in to the data structure and it will automatically know here to put it? 例如,如果我知道日期范围在0-100(或某些其他自定义范围)之间,并且我有10个垃圾箱,则垃圾箱1包含0到10,以此类推,那么什么是最佳实现,以便我可以简单地删除该值进入数据结构,它会自动知道要放在这里吗?

I've looked at here but this is when you have all the data together, not when its coming in in real time. 我已经看过这里了,但这是您将所有数据放在一起时,而不是实时输入时。

My current design is simple to loop and identify which basket it belongs to but that is so slow when I have lots of incoming data points for iteration that have 100k loops. 我当前的设计很容易循环并确定它属于哪个篮子,但是当我有很多要进行100k循环的迭代的输入数据点时,这太慢了。

I think bisect may be what you want, this is based on the example in the docs: 我认为bisect可能就是您想要的,这是基于docs中的示例:

from bisect import bisect

d = {"A": 0, "B": 0, "C": 0, "D": 0, "E": 0, "F": 0}


def grade(score, breakpoints=[70, 80, 90, 100], grades='FBCDA'):
    i = bisect(breakpoints, score)
    return grades[i]


for n in [66, 67, 77, 88, 80, 90, 91,100]:
    d[grade(n)] += n
print(d)
{'A': 100, 'C': 168, 'B': 77, 'E': 0, 'D': 181, 'F': 145}

I wrote this as saying that bin 0 = [ Min : (Max-Min)/Nbins) 我这样写是说bin 0 = [Min:(Max-Min)/ Nbins)

class bins():
    def __init__(self,Min,Max,Nbins):
        self.bins = {}
        self.Min=float(Min)
        self.Max=float(Max)
        self.Nbins=Nbins
        for k in range(0,Nbins):
            self.bins[k]=[]
    def AddToMap(self,n):
        if(n<self.Min or n>=self.Max):
            print("Object out of map range. [ "+str(n)+" ]")
        else:
            k = int((n-self.Min)/((self.Max-self.Min)/float(self.Nbins)))
            self.bins[k].append(n)

    def prt(self):
        for k in self.bins:
            print self.bins[k]

b = bins(0,100,10)
b.AddToMap(1)
b.AddToMap(13)
b.AddToMap(21)
b.AddToMap(14)
b.AddToMap(13)
b.AddToMap(9)
b.AddToMap(11)
b.AddToMap(10)
b.AddToMap(0)
b.AddToMap(100)
b.AddToMap(42)

b.prt()

yielding 屈服

Object out of map range. [ 100 ]
[1, 9, 0]
[13, 14, 13, 11, 10]
[21]
[]
[42]
[]
[]
[]
[]
[] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM