I'm trying to add a new column to a Pandas Dataframe that calculates the maximum value of all of the following records in the dataset, ie the maximum of the current row + 1, to the end of the dataset.
The dataset looks like this:
datetime | price | max_future_price |
---|---|---|
2021-02-25 10:00:00 | 10.00 | |
2021-02-25 10:00:01 | 10.01 | |
2021-02-25 10:00:02 | 10.00 | |
2021-02-25 10:00:03 | 09.99 |
I am using a for loop and shift function (bad I know) but it was taking forever with larger datasets... is there a better / more scalable solution? I have spent a fair few hours searching and trying to trial and error my way through it with no luck. Thanks!
for row in range(len(df)):
max_future_price = df.price.iloc[row+1:].max()
max_future_return = round(((max_future_price - df.price.iloc[row])/df.price.iloc[row]),4)
df.max_future_price.iloc[row] = max_future_return
You can revert your price
column and use cummax
to determine your max_future_price
.
df['max_future_price'] = df.iloc[::-1, 'price'].cummax().values
df['max_future_return'] = df.max_future_price.subtract(df.price).divide(df.price)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.