How to select rows from a data frame based on the values in a table of ranges

Question

I have a table of ranges (start,end) :

name     blue         green          yellow        purple              
a        1,5                         654,678       11,15
b                     88761,88776  
c        1211,1215                   38,47    
d        89,95                                     1567,1578

And a data frame like this:

Supplier        colour                   
Abi             1                               
John            678          
Smith           120               
Tim             1570 
Don             87560

How can I filter the df to contain only rows whose values in the colour column are within the ranges provided in the table? I'd like the final df to look like this:

Supplier        colour                   
Abi             1                               
John            678                         
Tim             1570

Thank you!

Answer 1

Try using a list comprehension and loc :

l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))

Output:

     Supplier        colour                   
0    Abi             1                               
1    John            678                         
2    Tim             1570

Answer 2

Try:

Firstly replace ' ' to NaN via replace() method:

df1=df1.replace(r'\s+',float('NaN'),regex=True)
                  #^ it will replace one or more occurence of ' '

The idea is to make the string ranges to actual list of combined range values:

s=df1.set_index('name').stack().dropna() 
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()

Finally:

out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]

output of out :

    Supplier    colour
0   Abi          1
1   John        678
3   Tim         1570

Answer 3

Use pd.cut and pd.IntervalIndex :

tups = table.set_index('name').unstack() \
            .replace(r'\s+', float('nan'), regex=True).dropna() \
            .apply(lambda x: tuple([int(i) for i in x.split(',')])).values

ii = pd.IntervalIndex.from_tuples(tups, closed='both')

out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]

>>> out
  Supplier  colour
0      Abi       1
1     John     678
3      Tim    1570

How to select rows from a data frame based on the values in a table of ranges

Question

3 answers

solution1
1 2021-07-21 06:37:49

solution2
1 ACCPTED 2021-07-21 06:43:16

solution3
1 2021-07-21 06:55:14

How to select rows from a data frame based on the values in a table of ranges

Question

3 answers

solution1 1 2021-07-21 06:37:49

solution2 1 ACCPTED 2021-07-21 06:43:16

solution3 1 2021-07-21 06:55:14

solution1
1 2021-07-21 06:37:49

solution2
1 ACCPTED 2021-07-21 06:43:16

solution3
1 2021-07-21 06:55:14