pandas efficient way to get first filtered row for each DatetimeIndex entry

Question

I have a DataFrame with the following structure:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00
Data columns:
open          3333  non-null values
high          3333  non-null values
low           3333  non-null values
close         3333  non-null values
volume        3333  non-null values
amount        3333  non-null values
pct_change    3332  non-null values
dtypes: float64(7)

The pct_change column contains percent change data.

Given a filtered DatetimeIndex from the DataFrame above:

<class 'pandas.tseries.index.DatetimeIndex'>
[2000-03-01 00:00:00, ..., 2012-11-01 00:00:00]
Length: 195, Freq: None, Timezone: UTC

I want to filter starting each date entry and return the first row where pct_change column is below 0.015.

I came up with this solution but it is very slow:

stops = []
#dates = DatetimeIndex
for d in dates:
    #check if pct_change is below -0.015 starting from date of signal. return date of first match
    match = df[df["pct_change"] < -0.015].ix[d:][:1].index

    stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]])

Any suggestions on how I can improve this?

Answer 1

You may find it faster to extract the index as a column and use apply and bfill .
Something like this:

df['datetime'] = df.index
df['stops'] = df.apply(lambda x: x['datetime']
                                 if x['pct_change'] < -0.015
                                 else np.nan,
                        axis=1)
df['stops'] = df['stops'].bfill()

Answer 2

How about this:

result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')

The only problem with this is that if an interval does NOT contain a value below -0.015, it will retrieve one from a future interval. If you add a column containing the date you can see the time each row came from, then set rows to NA if the retrieved timestamp exceeds the next "bin edge".

pandas efficient way to get first filtered row for each DatetimeIndex entry

Question

2 answers

solution1
2 2012-12-29 21:41:07

solution2
2 ACCPTED 2013-01-02 19:58:16

pandas efficient way to get first filtered row for each DatetimeIndex entry

Question

2 answers

solution1 2 2012-12-29 21:41:07

solution2 2 ACCPTED 2013-01-02 19:58:16

solution1
2 2012-12-29 21:41:07

solution2
2 ACCPTED 2013-01-02 19:58:16