I have a DataFrame with the following structure:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00
Data columns:
open 3333 non-null values
high 3333 non-null values
low 3333 non-null values
close 3333 non-null values
volume 3333 non-null values
amount 3333 non-null values
pct_change 3332 non-null values
dtypes: float64(7)
The pct_change
column contains percent change data.
Given a filtered DatetimeIndex from the DataFrame above:
<class 'pandas.tseries.index.DatetimeIndex'>
[2000-03-01 00:00:00, ..., 2012-11-01 00:00:00]
Length: 195, Freq: None, Timezone: UTC
I want to filter starting each date entry and return the first row where pct_change
column is below 0.015.
I came up with this solution but it is very slow:
stops = []
#dates = DatetimeIndex
for d in dates:
#check if pct_change is below -0.015 starting from date of signal. return date of first match
match = df[df["pct_change"] < -0.015].ix[d:][:1].index
stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]])
Any suggestions on how I can improve this?
How about this:
result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')
The only problem with this is that if an interval does NOT contain a value below -0.015, it will retrieve one from a future interval. If you add a column containing the date you can see the time each row came from, then set rows to NA if the retrieved timestamp exceeds the next "bin edge".
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.