简体   繁体   中英

Cumulative sum, refreshing at intervals, python pandas

I have some timestamped data, and I would like to run an expanding sum, that will refresh, say every day at 7:00 (restart from zero), kind of a "saw-teeth" sum. How can I do that in pandas? Thank you very much, JT2

  1. simplest case is to groupby() the floor("D") of the date. To meet your requirement, subtract 7 hours before doing floor
  2. then use transform("cumsum") so you get the running total with same cardinality of original dataframe
  3. showed results for 6am and 7am where you see totals are resetting
import pandas as pd
import random
df = pd.DataFrame([{'DATE':d, "value":random.randint(0,10)}
 for d in pd.date_range(start=datetime(2020,7,24),end=datetime(2020,7,30), freq="15min")])

df["cumsum"] = df.groupby((df["DATE"]-pd.Timestamp(1970,1,1,7)).dt.floor("D"))["value"].transform("cumsum")

df[df["DATE"].dt.hour.isin([6,7])][:15]

output

               DATE  value  cumsum
2020-07-24 06:00:00      3     137
2020-07-24 06:15:00      0     137
2020-07-24 06:30:00      6     143
2020-07-24 06:45:00      7     150
2020-07-24 07:00:00      0       0
2020-07-24 07:15:00      3       3
2020-07-24 07:30:00     10      13
2020-07-24 07:45:00      5      18
2020-07-25 06:00:00      6     459
2020-07-25 06:15:00     10     469
2020-07-25 06:30:00      8     477
2020-07-25 06:45:00      8     485
2020-07-25 07:00:00      3       3
2020-07-25 07:15:00      4       7
2020-07-25 07:30:00      0       7

Assume that your DataFrame contains:

Dat               Amount
2020-07-01 10:00   10.0
2020-07-02 06:50    3.1
2020-07-02 07:00    1.0
2020-07-02 08:10    2.1
2020-07-03 05:00    3.2
2020-07-03 10:00   12.0
2020-07-03 13:10    8.0

To perform your grouping and expanding sum, you can run:

df.groupby(pd.Grouper(key='Dat', freq='24H', base=7)).Amount.expanding().sum()

For the above data sample, the result is:

Dat                   
2020-07-01 07:00:00  0    10.0
                     1    13.1
2020-07-02 07:00:00  2     1.0
                     3     3.1
                     4     6.3
2020-07-03 07:00:00  5    12.0
                     6    20.0
Name: Amount, dtype: float64

You perform grouping by 24 hour periods (days), but base shifts the start of day just by 7 hours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM