简体   繁体   English

确定列值是否在基于另一列的条件范围之间

[英]Determining if a column value is between a conditional range based on another column

I have a dataframe that looks as follows: 我有一个数据框,如下所示:

    data = np.array([[5, 'red', 2,6, 8, 10],
                 [11, 'red', 3,9,6,15],
                 [8, 'blue', 0, 3, 5, 10],
                 [2, 'blue', 1, 2, 3, 4]])
    df = pd.DataFrame(data, columns = ['A','B','red_lower', 'red_upper', 'blue_lower', 'blue_upper'])
    A     B red_lower red_upper blue_lower blue_upper
0   5   red         2         6          8         10
1  11   red         3         9          6         15
2   8  blue         0         3          5         10
3   2  blue         1         2          3          4

I'd like to create an additional column that tells me if the value in a column A is in the range of the color given in column B. For example, in row 0, since 5 has the designation red, I will check if 5 is between 2 and 6. It is, so I will have the new column have a 1. 我想创建一个额外的列,告诉我A列中的值是否在B列中给出的颜色范围内。例如,在第0行中,由于5的名称为红色,我将检查是否为5是2到6之间。所以我将新列有一个1。

Desired result: 期望的结果:

    A    B   red_lower red_upper blue_lower blue_upper in_range
0   5   red         2         6          8         10        1
1  11   red         3         9          6         15        0
2   8  blue         0         3          5         10        1
3   2  blue         1         2          3          4        0

I've tried to write a loop, but I'm getting many series errors. 我试过写一个循环,但是我遇到了很多系列错误。 I really dont want to have to split up the dataframe (by color), but maybe that's the way to go? 我真的不想分开数据框(按颜色),但也许这是要走的路? (in my actual dataframe, there are six different 'colors', not just two). (在我的实际数据框中,有六种不同的'颜色',而不仅仅是两种)。

Thank you! 谢谢!

EDIT: bonus if we have the additional column tell me if the value is above or below the range! 编辑:奖金,如果我们有额外的列告诉我,如果值高于或低于范围! For example, in row 1, 11 is outside the range, so is too high. 例如,在第1行中,11超出范围,因此太高。 Table should look this way: 表应该是这样的:

    A     B red_lower red_upper blue_lower blue_upper in_range
0   5   red         2         6          8         10   inside
1  11   red         3         9          6         15    above
2   8  blue         0         3          5         10   inside
3   2  blue         1         2          3          4    below

justify + broadcast + mask + logical_and justify + broadcast + mask + logical_and

You can use some nifty broadcasting here, and the function justify from another answer. 你可以在这里使用一些漂亮的广播,并从另一个答案justify这个功能。 This assumes that each color has a single valid range. 这假设每种颜色都有一个有效范围。 It also assumes that all of your numeric columns are in fact numeric . 它还假设您的所有数字列实际上都是数字


values = df.A.values
colors = df.B.values

range_frame = df.iloc[:, 2:]
ranges = range_frame.columns.str.split('_').str[0].values

m = colors != ranges[:, None]
masked = range_frame.mask(m)

jf = justify(masked.values, invalid_val=np.nan)[:, :2]
ir = np.logical_and(jf[:, 0] < values, values < jf[:, 1]).astype(int)

c1 = values <= jf[:, 0]
c2 = values >= jf[:, 1]

irl = np.select([c1, c2], ['below', 'above'], 'inside')

df.assign(in_range=ir, in_range_flag=irl)

    A     B  red_lower  red_upper  blue_lower  blue_upper  in_range in_range_flag
0   5   red          2          6           8          10         1        inside
1  11   red          3          9           6          15         0         above
2   8  blue          0          3           5          10         1        inside
3   3  blue          1          2           3           4         0         below

stack + reshape + logical_and stack + reshape + logical_and

Again making the same assumptions as the first answer. 再次做出与第一个答案相同的假设。


u = df.set_index(['A', 'B']).stack().rename_axis(['A', 'B', 'flag']).reset_index()
frame = u[u.flag.str.split('_').str[0] == u.B]

values = frame[::2].A.values
ranges = frame[0].values.reshape(-1, 2)

ir = np.logical_and(ranges[:, 0] < values, values < ranges[:, 1])

c1 = values <= ranges[:, 0]
c2 = values >= ranges[:, 1]

irl = np.select([c1, c2], ['below', 'above'], 'inside')

df.assign(in_range=ir, in_range_flag=irl)

Here is the definition for the justify function by @Divakar: 以下是@Divakar的justify函数的定义:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

Here is using groupby split the df and most of step handled by the definition , which means you do not need input the different color each time 这里使用groupby分割df和大部分步骤由定义处理,这意味着你不需要每次都输入不同的颜色

l=[]
for name,x  in df.groupby('B',sort=False):
    s1=(x.A >= x.filter(like=name).iloc[:, 0]) & (x.A <= x.filter(like=name).iloc[:, 1])
    s2=x.A<x.filter(like=name).iloc[:, 0]
    l.extend(np.select([s1,s2],['inside','below'],default='above').tolist())

df['in_range']=l
df
Out[64]: 
    A     B  red_lower  red_upper  blue_lower  blue_upper in_range
0   5   red          2          6           8          10   inside
1  11   red          3          9           6          15    above
2   8  blue          0          3           5          10   inside
3   2  blue          1          2           3           4    below

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Determining Values in Pandas Dataframe 基于另一列中的前几行值 - Determining Values in Pandas Dataframe Based on Previous Rows Value in Another Column 熊猫根据另一列的值创建条件列 - Pandas creating a conditional column based on the value of another column 列中的条件格式单元格基于它在另一列中的对应值 - Conditional format cell in column based on it corresponding value in another column Pandas - 根据另一列的条件值创建新列 - Pandas - create new column based on conditional value of another column 基于另一列的值对一列Pandas DF进行条件运算 - Conditional operation on one column of Pandas DF based on value of another column 基于另一个列值的 pandas dataframe 列上的条件过滤器阈值 - Conditional filter threshold on pandas dataframe column based on another column value 如果值落在一个范围内,则根据另一列的条件创建新列 - Create new column based on condition of another column if value falls in a range 如何根据另一列中的日期值范围创建排名列? - How to create a ranking column based on date value range in another column? 根据另一列 python 中的值范围创建带有桶的列 - create column with buckets based on value range in another column python 基于另一列的条件填充 - Conditional ffill based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM