Add/fill pandas column based on range in rows from another dataframe

Question

Working with pandas, I have df1 indexed by time samples:

data = '''\
time       flags    input                  
8228835.0  53153.0  32768.0
8228837.0  53153.0  32768.0
8228839.0  53153.0  32768.0
8228841.0  53153.0  32768.0
8228843.0  61345.0  32768.0'''

fileobj = pd.compat.StringIO(data)
df1 = pd.read_csv(fileobj, sep='\s+', index_col='time')

df2 indicates time ranges with start and end to define ranges where the state of 'check' is True:

data = '''\
        check     start       end
20536   True   8228837   8228993
20576   True   8232747   8232869
20554   True   8230621   8230761
20520   True   8227351   8227507
20480   True   8223549   8223669
20471   True   8221391   8221553'''

fileobj = pd.compat.StringIO(data)
df2 = pd.read_csv(fileobj, sep='\s+')

What I need to do is add a column for 'check' to df1 and fill out the actual time ranges defined in df2 with the value of True. All others should be False. An example result would be:

             flags    input    check
time                       
8228835.0  53153.0  32768.0    False
8228837.0  53153.0  32768.0    True
8228839.0  53153.0  32768.0    True
8228841.0  53153.0  32768.0    True
8228843.0  61345.0  32768.0    True
....
8228994.0. 12424.0. 32768.0.   False

Answer 1

You can make a list or ranges, and then use pd.Index.isin with itertools.chain :

from itertools import chain

df2 = df2[df2['check']]

ranges = map(range, df2['start'], df2['end'])

df1['check'] = df1.index.isin(chain.from_iterable(ranges))

print(df1)

             flags    input  check
time                              
8228835.0  53153.0  32768.0  False
8228837.0  53153.0  32768.0   True
8228839.0  53153.0  32768.0   True
8228841.0  53153.0  32768.0   True
8228843.0  61345.0  32768.0   True

Answer 2

I think you can using IntervalIndex with loc

df2.index=pd.IntervalIndex.from_arrays(df2.start,df2.end,'both')
df2.loc[df.index]
Out[174]: 
        check  start  end
[1, 2]   True      1    2
[4, 5]   True      4    5
[7, 8]   True      7    8
df['newcol']=df2.loc[df.index].check.values.tolist()
df
Out[176]: 
       flags    input  newcol
flags                        
2          2  32768.0    True
4          4  32768.0    True
7          7  32768.0    True

Answer 3

A list comprehension using any() . No clue about the actual performance though, would be nice if you could run the %timings for us!

df1['check'] = [any(start <= i <= end for start,end in 
                    zip(df2['start'], df2['end'])) for i in df1.index]

print(df1)

Returns:

             flags    input  check
time                              
8228835.0  53153.0  32768.0  False
8228837.0  53153.0  32768.0   True
8228839.0  53153.0  32768.0   True
8228841.0  53153.0  32768.0   True
8228843.0  61345.0  32768.0   True

Add/fill pandas column based on range in rows from another dataframe

Question

3 answers

solution1
2 ACCPTED 2018-08-03 00:34:20

solution2
1 2018-08-03 00:00:54

solution3
0 2018-08-03 00:08:20

Add/fill pandas column based on range in rows from another dataframe

Question

3 answers

solution1 2 ACCPTED 2018-08-03 00:34:20

solution2 1 2018-08-03 00:00:54

solution3 0 2018-08-03 00:08:20

solution1
2 ACCPTED 2018-08-03 00:34:20

solution2
1 2018-08-03 00:00:54

solution3
0 2018-08-03 00:08:20