[英]Determining if a column value is between a conditional range based on another column
I have a dataframe that looks as follows: 我有一个数据框,如下所示:
data = np.array([[5, 'red', 2,6, 8, 10],
[11, 'red', 3,9,6,15],
[8, 'blue', 0, 3, 5, 10],
[2, 'blue', 1, 2, 3, 4]])
df = pd.DataFrame(data, columns = ['A','B','red_lower', 'red_upper', 'blue_lower', 'blue_upper'])
A B red_lower red_upper blue_lower blue_upper
0 5 red 2 6 8 10
1 11 red 3 9 6 15
2 8 blue 0 3 5 10
3 2 blue 1 2 3 4
I'd like to create an additional column that tells me if the value in a column A is in the range of the color given in column B. For example, in row 0, since 5 has the designation red, I will check if 5 is between 2 and 6. It is, so I will have the new column have a 1. 我想创建一个额外的列,告诉我A列中的值是否在B列中给出的颜色范围内。例如,在第0行中,由于5的名称为红色,我将检查是否为5是2到6之间。所以我将新列有一个1。
Desired result: 期望的结果:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 1
1 11 red 3 9 6 15 0
2 8 blue 0 3 5 10 1
3 2 blue 1 2 3 4 0
I've tried to write a loop, but I'm getting many series errors. 我试过写一个循环,但是我遇到了很多系列错误。 I really dont want to have to split up the dataframe (by color), but maybe that's the way to go? 我真的不想分开数据框(按颜色),但也许这是要走的路? (in my actual dataframe, there are six different 'colors', not just two). (在我的实际数据框中,有六种不同的'颜色',而不仅仅是两种)。
Thank you! 谢谢!
EDIT: bonus if we have the additional column tell me if the value is above or below the range! 编辑:奖金,如果我们有额外的列告诉我,如果值高于或低于范围! For example, in row 1, 11 is outside the range, so is too high. 例如,在第1行中,11超出范围,因此太高。 Table should look this way: 表应该是这样的:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 inside
1 11 red 3 9 6 15 above
2 8 blue 0 3 5 10 inside
3 2 blue 1 2 3 4 below
justify
+ broadcast
+ mask
+ logical_and
justify
+ broadcast
+ mask
+ logical_and
You can use some nifty broadcasting here, and the function justify
from another answer. 你可以在这里使用一些漂亮的广播,并从另一个答案justify
这个功能。 This assumes that each color has a single valid range. 这假设每种颜色都有一个有效范围。 It also assumes that all of your numeric columns are in fact numeric . 它还假设您的所有数字列实际上都是数字 。
values = df.A.values
colors = df.B.values
range_frame = df.iloc[:, 2:]
ranges = range_frame.columns.str.split('_').str[0].values
m = colors != ranges[:, None]
masked = range_frame.mask(m)
jf = justify(masked.values, invalid_val=np.nan)[:, :2]
ir = np.logical_and(jf[:, 0] < values, values < jf[:, 1]).astype(int)
c1 = values <= jf[:, 0]
c2 = values >= jf[:, 1]
irl = np.select([c1, c2], ['below', 'above'], 'inside')
df.assign(in_range=ir, in_range_flag=irl)
A B red_lower red_upper blue_lower blue_upper in_range in_range_flag
0 5 red 2 6 8 10 1 inside
1 11 red 3 9 6 15 0 above
2 8 blue 0 3 5 10 1 inside
3 3 blue 1 2 3 4 0 below
stack
+ reshape
+ logical_and
stack
+ reshape
+ logical_and
Again making the same assumptions as the first answer. 再次做出与第一个答案相同的假设。
u = df.set_index(['A', 'B']).stack().rename_axis(['A', 'B', 'flag']).reset_index()
frame = u[u.flag.str.split('_').str[0] == u.B]
values = frame[::2].A.values
ranges = frame[0].values.reshape(-1, 2)
ir = np.logical_and(ranges[:, 0] < values, values < ranges[:, 1])
c1 = values <= ranges[:, 0]
c2 = values >= ranges[:, 1]
irl = np.select([c1, c2], ['below', 'above'], 'inside')
df.assign(in_range=ir, in_range_flag=irl)
Here is the definition for the justify
function by @Divakar: 以下是@Divakar的justify
函数的定义:
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = ~np.isnan(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
Here is using groupby
split the df and most of step handled by the definition , which means you do not need input the different color each time 这里使用groupby
分割df和大部分步骤由定义处理,这意味着你不需要每次都输入不同的颜色
l=[]
for name,x in df.groupby('B',sort=False):
s1=(x.A >= x.filter(like=name).iloc[:, 0]) & (x.A <= x.filter(like=name).iloc[:, 1])
s2=x.A<x.filter(like=name).iloc[:, 0]
l.extend(np.select([s1,s2],['inside','below'],default='above').tolist())
df['in_range']=l
df
Out[64]:
A B red_lower red_upper blue_lower blue_upper in_range
0 5 red 2 6 8 10 inside
1 11 red 3 9 6 15 above
2 8 blue 0 3 5 10 inside
3 2 blue 1 2 3 4 below
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.