Working with pandas, I have df1 indexed by time samples:
data = '''\
time flags input
8228835.0 53153.0 32768.0
8228837.0 53153.0 32768.0
8228839.0 53153.0 32768.0
8228841.0 53153.0 32768.0
8228843.0 61345.0 32768.0'''
fileobj = pd.compat.StringIO(data)
df1 = pd.read_csv(fileobj, sep='\s+', index_col='time')
df2 indicates time ranges with start and end to define ranges where the state of 'check' is True:
data = '''\
check start end
20536 True 8228837 8228993
20576 True 8232747 8232869
20554 True 8230621 8230761
20520 True 8227351 8227507
20480 True 8223549 8223669
20471 True 8221391 8221553'''
fileobj = pd.compat.StringIO(data)
df2 = pd.read_csv(fileobj, sep='\s+')
What I need to do is add a column for 'check' to df1 and fill out the actual time ranges defined in df2 with the value of True. All others should be False. An example result would be:
flags input check
time
8228835.0 53153.0 32768.0 False
8228837.0 53153.0 32768.0 True
8228839.0 53153.0 32768.0 True
8228841.0 53153.0 32768.0 True
8228843.0 61345.0 32768.0 True
....
8228994.0. 12424.0. 32768.0. False
You can make a list or ranges, and then use pd.Index.isin
with itertools.chain
:
from itertools import chain
df2 = df2[df2['check']]
ranges = map(range, df2['start'], df2['end'])
df1['check'] = df1.index.isin(chain.from_iterable(ranges))
print(df1)
flags input check
time
8228835.0 53153.0 32768.0 False
8228837.0 53153.0 32768.0 True
8228839.0 53153.0 32768.0 True
8228841.0 53153.0 32768.0 True
8228843.0 61345.0 32768.0 True
I think you can using IntervalIndex
with loc
df2.index=pd.IntervalIndex.from_arrays(df2.start,df2.end,'both')
df2.loc[df.index]
Out[174]:
check start end
[1, 2] True 1 2
[4, 5] True 4 5
[7, 8] True 7 8
df['newcol']=df2.loc[df.index].check.values.tolist()
df
Out[176]:
flags input newcol
flags
2 2 32768.0 True
4 4 32768.0 True
7 7 32768.0 True
A list comprehension using any()
. No clue about the actual performance though, would be nice if you could run the %timings for us!
df1['check'] = [any(start <= i <= end for start,end in
zip(df2['start'], df2['end'])) for i in df1.index]
print(df1)
Returns:
flags input check
time
8228835.0 53153.0 32768.0 False
8228837.0 53153.0 32768.0 True
8228839.0 53153.0 32768.0 True
8228841.0 53153.0 32768.0 True
8228843.0 61345.0 32768.0 True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.