The output
column below is what I'm trying to calculate and the diffs
column is an explanation of the differences that are summed to calculate output
.
+------------+--------+-------------+
| date | output | diffs |
+------------+--------+-------------+
| 01/01/2000 | | |
| 10/01/2000 | 9 | [9] |
| 20/01/2000 | 29 | [10, 19] |
| 25/01/2000 | 44 | [5, 15, 24] |
+------------+--------+-------------+
I've thought about using rolling
and then creating a new column within each window for the diffs based on the last record in the current window and then summing these. However, rolling
doesn't seem to have the ability to fix at the beginning of a DataFrame. I suppose I could calculate the difference between the minimum and maximum dates and use this as the rolling period but that seems hacky.
I've also looked at expanding
but I couldn't see a way of creating new diffs as the window expanded.
Is there a non-loop, hopefully vectorisable, solution to this?
Here's the DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'date': (
dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 10),
dt.datetime(2000, 1, 20), dt.datetime(2000, 1, 25),
),
'output': (np.NaN, 9, 29, 44),
}
)
If you're looking for output, try:
datediff = df.date.diff()/pd.Timedelta('1D')
df['output'] = (datediff * np.arange(len(df))).cumsum()
Output:
date output
0 2000-01-01 NaN
1 2000-01-10 9.0
2 2000-01-20 29.0
3 2000-01-25 44.0
I'll leave the it to you to work out the logic behind.
We may still need for loop, however we can do numpy
boardcast in order to reduce the calculation time
s = df.date.values
df['new'] = [y[:x][::-1] for x,y in enumerate((s[:,None]-s).astype('timedelta64[D]'))]
df
date output new
0 2000-01-01 NaN []
1 2000-01-10 9.0 [9 days]
2 2000-01-20 29.0 [10 days, 19 days]
3 2000-01-25 44.0 [5 days, 15 days, 24 days]
For you output
df.date.diff().dt.days.cumsum()
Using numpy
broadcasting without looping:
i = df.date.dt.day.values
j = np.arange(len(df))
df['output'] = np.triu(np.where((j < j[:, None]), i, (i - i[:, None]))).sum(axis = 0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.