简体   繁体   English

最小最大列返回查找之间的熊猫

[英]Pandas between min max column returning lookup

I see a few questions about looking up a single value in between a range like this one , however I need something that loops over all rows and is a bit more performative.我看到一些关于在像这样的范围之间查找单个值的问题,但是我需要一些可以循环所有行并且更具执行性的东西。

# I have some dataset (10k to 1m rows)
values = pd.DataFrame([["foo", 5], ["bar", 15]], columns=["foobar", "values"])

# and a lookup table (25 rows)
lookups = pd.DataFrame([["A1", 0, 10], ["A2", 10, 20]], columns=["tier", "min", "max"])

My desired out come would be a lookup of tiers based on the value of values, and between the range of min & max on lookup table:我想要的结果是基于值的值以及查找表上的最小值和最大值范围之间的层查找:

    foobar  values  tier
0      foo       5    A1
1      bar      15    A2

And i've got something working, but it's scaling really poorly:我有一些工作,但它的扩展性非常差:

def lookup(score):
    for idx, row in lookups.iterrows():
        if row["min"] <= score < row["max"]:
            return row["tier"]

values["tier"] = values["values"].apply(lookup)

My second thought would be to create a dataframe where the index is just (0-lookup.max.max()] with the tiers repeated/tiled, but was hoping there was a more built in option?我的第二个想法是创建一个数据框,其中索引只是 (0-lookup.max.max()] 层重复/平铺,但希望有更多内置选项?

Thanks谢谢

This is pd.cut :这是pd.cut

values['tier'] = pd.cut(values['values'], 
       bins=list(lookups['min']) + [lookups['max'].iloc[-1]],
       labels=lookups['tier']
      )

Output:输出:

  foobar  values tier
0    foo       5   A1
1    bar      15   A2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM