[英]compare whether value in one column is between two values in another column python pandas
I have two data frames as follow: 我有两个数据框如下:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
I want to find whether "value" in A is between 'high' and low' in B, and if so, I want to copy the column name from B to A. 我想知道A中的“值”是否在B中的“高”和“低”之间,如果是,我想将列名从B复制到A.
The output should look like this: 输出应如下所示:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})
My function uses iterrows as follows: 我的函数使用iterrows如下:
def func1(row):
x = row['value']
for index,value in B.iterrows():
if ((value['low'] <= x) &(x<=value['high'])):
return value['name']
But it doesn't yet achieve what i want to do, 但它还没有实现我想做的事情,
thank you, 谢谢,
You can use a list comprehension to iterate through the values in A
, and then use loc
to get the relevant mapped values. 您可以使用列表推导来迭代A
的值,然后使用loc
来获取相关的映射值。 le
is less than or equal to, and ge
is greater than or equal to. le
小于或等于,且ge
大于或等于。
For example, v = 3
in the first row. 例如,第一行中v = 3
。 Using simple boolean indexing: 使用简单的布尔索引:
>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
high low name
0 5 1 one
Assuming that DataFrame B
does not have any overlapping ranges, then you will get back one row as above. 假设DataFrame B
没有任何重叠范围,那么您将返回上面的一行。 One then uses loc
to get the name
column, as below. 然后使用loc
获取name
列,如下所示。 Because each returned name is aa series, you need get the first and only scalar value (using iat
, for example). 因为每个返回的名称都是一个系列,所以您需要获取第一个和唯一的标量值(例如,使用iat
)。
A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0]
for v in A['value']]
>>> A
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
I believe you are looking for something like this: 我相信你正在寻找这样的东西:
In [1]: import pandas as pd
In [2]: A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
In [3]:
In [3]: B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
In [4]: A
Out[4]:
value
0 3
1 7
2 5
3 18
4 23
5 27
6 21
7 29
In [5]: B
Out[5]:
high low name
0 5 1 one
1 10 6 two
2 15 11 three
3 20 16 four
4 25 21 five
5 30 26 six
In [6]: def func1(x):
...: for row in B.itertuples():
...: if row.low <= x <= row.high:
...: return row.name
...:
In [7]: A.value.map(func1)
Out[7]:
0 one
1 two
2 one
3 four
4 five
5 six
6 five
7 six
Name: value, dtype: object
In [8]: A['name'] = A['value'].map(func1)
In [9]: A
Out[9]:
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
I use itertuples
because it should be a little bit faster but in general this will not be very efficient. 我使用itertuples
因为它应该快一点,但一般来说这不会非常有效。 This is a solution but there might be better ones. 这是一个解决方案,但可能会有更好的解决方案。
In [8]: timeit A['value'].map(func1)
100 loops, best of 3: 10.5 ms per loop
In [9]: timeit [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].tolist()[0] for v in A['value']]
100 loops, best of 3: 9.06 ms per loop
Quick and dirty test shows that Alexander's approach is faster. 快速而肮脏的测试表明Alexander的方法更快。 I wonder how it scales. 我想知道它是如何扩展的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.