简体   繁体   中英

pandas efficient way to get first filtered row for each DatetimeIndex entry

I have a DataFrame with the following structure:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00
Data columns:
open          3333  non-null values
high          3333  non-null values
low           3333  non-null values
close         3333  non-null values
volume        3333  non-null values
amount        3333  non-null values
pct_change    3332  non-null values
dtypes: float64(7)

The pct_change column contains percent change data.

Given a filtered DatetimeIndex from the DataFrame above:

<class 'pandas.tseries.index.DatetimeIndex'>
[2000-03-01 00:00:00, ..., 2012-11-01 00:00:00]
Length: 195, Freq: None, Timezone: UTC

I want to filter starting each date entry and return the first row where pct_change column is below 0.015.

I came up with this solution but it is very slow:

stops = []
#dates = DatetimeIndex
for d in dates:
    #check if pct_change is below -0.015 starting from date of signal. return date of first match
    match = df[df["pct_change"] < -0.015].ix[d:][:1].index

    stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]])

Any suggestions on how I can improve this?

You may find it faster to extract the index as a column and use apply and bfill .
Something like this:

df['datetime'] = df.index
df['stops'] = df.apply(lambda x: x['datetime']
                                 if x['pct_change'] < -0.015
                                 else np.nan,
                        axis=1)
df['stops'] = df['stops'].bfill()

How about this:

result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')

The only problem with this is that if an interval does NOT contain a value below -0.015, it will retrieve one from a future interval. If you add a column containing the date you can see the time each row came from, then set rows to NA if the retrieved timestamp exceeds the next "bin edge".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM