比较一列中的值是否在另一列python pandas中的两个值之间

Question

I have two data frames as follow: 我有两个数据框如下：

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})

B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})

I want to find whether "value" in A is between 'high' and low' in B, and if so, I want to copy the column name from B to A. 我想知道A中的“值”是否在B中的“高”和“低”之间，如果是，我想将列名从B复制到A.

The output should look like this: 输出应如下所示：

A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})

My function uses iterrows as follows: 我的函数使用iterrows如下：

def func1(row):
    x = row['value']
    for index,value in B.iterrows():
        if ((value['low'] <= x) &(x<=value['high'])):
            return value['name']

But it doesn't yet achieve what i want to do, 但它还没有实现我想做的事情，

thank you, 谢谢，

Answer 1

You can use a list comprehension to iterate through the values in A , and then use loc to get the relevant mapped values. 您可以使用列表推导来迭代A的值，然后使用loc来获取相关的映射值。 le is less than or equal to, and ge is greater than or equal to. le小于或等于，且ge大于或等于。

For example, v = 3 in the first row. 例如，第一行中v = 3 。 Using simple boolean indexing: 使用简单的布尔索引：

>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
   high  low name
0     5    1  one

Assuming that DataFrame B does not have any overlapping ranges, then you will get back one row as above. 假设DataFrame B没有任何重叠范围，那么您将返回上面的一行。 One then uses loc to get the name column, as below. 然后使用loc获取name列，如下所示。 Because each returned name is aa series, you need get the first and only scalar value (using iat , for example). 因为每个返回的名称都是一个系列，所以您需要获取第一个和唯一的标量值（例如，使用iat ）。

A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0] 
             for v in A['value']]

>>> A
   value  name
0      3   one
1      7   two
2      5   one
3     18  four
4     23  five
5     27   six
6     21  five
7     29   six

Answer 2

I believe you are looking for something like this: 我相信你正在寻找这样的东西：

In [1]: import pandas as pd

In [2]: A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})

In [3]: 

In [3]: B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})

In [4]: A
Out[4]: 
   value
0      3
1      7
2      5
3     18
4     23
5     27
6     21
7     29

In [5]: B
Out[5]: 
   high  low   name
0     5    1    one
1    10    6    two
2    15   11  three
3    20   16   four
4    25   21   five
5    30   26    six

In [6]: def func1(x):
   ...:     for row in B.itertuples():
   ...:         if row.low <= x <= row.high:
   ...:             return row.name
   ...:         

In [7]: A.value.map(func1)
Out[7]: 
0     one
1     two
2     one
3    four
4    five
5     six
6    five
7     six
Name: value, dtype: object

In [8]: A['name'] = A['value'].map(func1)

In [9]: A
Out[9]: 
   value  name
0      3   one
1      7   two
2      5   one
3     18  four
4     23  five
5     27   six
6     21  five
7     29   six

I use itertuples because it should be a little bit faster but in general this will not be very efficient. 我使用itertuples因为它应该快一点，但一般来说这不会非常有效。 This is a solution but there might be better ones. 这是一个解决方案，但可能会有更好的解决方案。

Edited to Add: 编辑添加：

In [8]: timeit A['value'].map(func1)
100 loops, best of 3: 10.5 ms per loop

In [9]: timeit [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].tolist()[0] for v in A['value']]
100 loops, best of 3: 9.06 ms per loop

Quick and dirty test shows that Alexander's approach is faster. 快速而肮脏的测试表明Alexander的方法更快。 I wonder how it scales. 我想知道它是如何扩展的。

比较一列中的值是否在另一列python pandas中的两个值之间

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-07-07 04:23:03

解决方案2
1 2016-07-07 04:04:36

Edited to Add: 编辑添加：

比较一列中的值是否在另一列python pandas中的两个值之间

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-07-07 04:23:03

解决方案2 1 2016-07-07 04:04:36

Edited to Add: 编辑添加：

解决方案1
4 已采纳 2016-07-07 04:23:03

解决方案2
1 2016-07-07 04:04:36