I have some timestamped data, and I would like to run an expanding sum, that will refresh, say every day at 7:00 (restart from zero), kind of a "saw-teeth" sum. How can I do that in pandas? Thank you very much, JT2
groupby()
the floor("D")
of the date. To meet your requirement, subtract 7 hours before doing floortransform("cumsum")
so you get the running total with same cardinality of original dataframeimport pandas as pd
import random
df = pd.DataFrame([{'DATE':d, "value":random.randint(0,10)}
for d in pd.date_range(start=datetime(2020,7,24),end=datetime(2020,7,30), freq="15min")])
df["cumsum"] = df.groupby((df["DATE"]-pd.Timestamp(1970,1,1,7)).dt.floor("D"))["value"].transform("cumsum")
df[df["DATE"].dt.hour.isin([6,7])][:15]
output
DATE value cumsum
2020-07-24 06:00:00 3 137
2020-07-24 06:15:00 0 137
2020-07-24 06:30:00 6 143
2020-07-24 06:45:00 7 150
2020-07-24 07:00:00 0 0
2020-07-24 07:15:00 3 3
2020-07-24 07:30:00 10 13
2020-07-24 07:45:00 5 18
2020-07-25 06:00:00 6 459
2020-07-25 06:15:00 10 469
2020-07-25 06:30:00 8 477
2020-07-25 06:45:00 8 485
2020-07-25 07:00:00 3 3
2020-07-25 07:15:00 4 7
2020-07-25 07:30:00 0 7
Assume that your DataFrame contains:
Dat Amount
2020-07-01 10:00 10.0
2020-07-02 06:50 3.1
2020-07-02 07:00 1.0
2020-07-02 08:10 2.1
2020-07-03 05:00 3.2
2020-07-03 10:00 12.0
2020-07-03 13:10 8.0
To perform your grouping and expanding sum, you can run:
df.groupby(pd.Grouper(key='Dat', freq='24H', base=7)).Amount.expanding().sum()
For the above data sample, the result is:
Dat
2020-07-01 07:00:00 0 10.0
1 13.1
2020-07-02 07:00:00 2 1.0
3 3.1
4 6.3
2020-07-03 07:00:00 5 12.0
6 20.0
Name: Amount, dtype: float64
You perform grouping by 24 hour periods (days), but base shifts the start of day just by 7 hours.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.