简体   繁体   中英

How to select rows from a data frame based on the values in a table of ranges

I have a table of ranges (start,end) :

name     blue         green          yellow        purple              
a        1,5                         654,678       11,15
b                     88761,88776  
c        1211,1215                   38,47    
d        89,95                                     1567,1578

And a data frame like this:

Supplier        colour                   
Abi             1                               
John            678          
Smith           120               
Tim             1570 
Don             87560                       

How can I filter the df to contain only rows whose values in the colour column are within the ranges provided in the table? I'd like the final df to look like this:

Supplier        colour                   
Abi             1                               
John            678                         
Tim             1570 

Thank you!

Try using a list comprehension and loc :

l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))

Output:

     Supplier        colour                   
0    Abi             1                               
1    John            678                         
2    Tim             1570 

Try:

Firstly replace ' ' to NaN via replace() method:

df1=df1.replace(r'\s+',float('NaN'),regex=True)
                  #^ it will replace one or more occurence of ' '

The idea is to make the string ranges to actual list of combined range values:

s=df1.set_index('name').stack().dropna() 
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()

Finally:

out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]

output of out :

    Supplier    colour
0   Abi          1
1   John        678
3   Tim         1570

Use pd.cut and pd.IntervalIndex :

tups = table.set_index('name').unstack() \
            .replace(r'\s+', float('nan'), regex=True).dropna() \
            .apply(lambda x: tuple([int(i) for i in x.split(',')])).values

ii = pd.IntervalIndex.from_tuples(tups, closed='both')

out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]
>>> out
  Supplier  colour
0      Abi       1
1     John     678
3      Tim    1570

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM