I have a table of ranges (start,end)
:
name blue green yellow purple
a 1,5 654,678 11,15
b 88761,88776
c 1211,1215 38,47
d 89,95 1567,1578
And a data frame like this:
Supplier colour
Abi 1
John 678
Smith 120
Tim 1570
Don 87560
How can I filter the df
to contain only rows whose values in the colour
column are within the ranges provided in the table? I'd like the final df
to look like this:
Supplier colour
Abi 1
John 678
Tim 1570
Thank you!
Try using a list comprehension and loc
:
l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))
Output:
Supplier colour
0 Abi 1
1 John 678
2 Tim 1570
Try:
Firstly replace ' '
to NaN
via replace()
method:
df1=df1.replace(r'\s+',float('NaN'),regex=True)
#^ it will replace one or more occurence of ' '
The idea is to make the string ranges to actual list of combined range values:
s=df1.set_index('name').stack().dropna()
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()
Finally:
out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]
output of out
:
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
Use pd.cut
and pd.IntervalIndex
:
tups = table.set_index('name').unstack() \
.replace(r'\s+', float('nan'), regex=True).dropna() \
.apply(lambda x: tuple([int(i) for i in x.split(',')])).values
ii = pd.IntervalIndex.from_tuples(tups, closed='both')
out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]
>>> out
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.