简体   繁体   中英

Calculate sum or average based on between dates

I have a code that I want to calculate the a sum or a average between two dates.

Example data that is in a Pandas DataFrame:

data = {'Dates':['2022-12-22','2021-01-01','2021-01-05','2021-01-07','2021-01-10','2021-01-13'],
        'Value':['4','3','6','5','4','3']}

I want to loop trough all dates, not just them in the column, so from a start date to an end and then get the average or sum from the all the dates minus for example 14 days.

So I'm a wee bit stuck in how to best construct this and I want to figure out a clever and clen way for calculate. I have two thoughts, number one feels uglier and heavyer, but I'm not sure that method two is possible. Do you guys have any ideas? (The code I've written down there is not correct, just to illistrate)

Method one:

date_start = 2020-12-20
date_stop = today() 
date_range = date_stop - date_start
for i in dates:
    if between this date and - 14 in data then sum Values

Or maybe metod two:

date_start = 2020-12-20
date_stop = today() 
date_range = date_stop - date_start
for i in dates:
    df.Value.ewm(span=14, adjust=False).mean()

EDIT:

Brilliant, Cheers.that works like a charm.

This is what I came up with:

input_data = pd.read_csv('data/df_random_hours.csv')
df = pd.DataFrame(input_data)
input_data = pd.read_csv('data/names.csv')
df2 = pd.DataFrame(input_data)
days = 365
date_start = datetime(2020, 1, 1)
date_stop = datetime(2021, 1, 1)

#date, NameID, Name, FDPsum_14
date_diff = (date_stop - date_start)
date_diff = date_diff.days
df['Start'] = pd.to_datetime(df['Start'])
print(df)
x = 0

df_sum = df.groupby(['NameID'])['FDP'].sum()
print(df_sum)

for x in range(0, len(df_sum)):
    for i in range(0, date_diff):
        if df_sum['NameID'][x] =
            date_now = (date_start + timedelta(days=i))
            date_old = (date_now - timedelta(days=14))
            sum = df[df['Start'].between(date_old,date_now)]['FDP'].sum()

My next challange is that it's also divided by name or nameid, looping like this is not going to work beause it's nothing telling the sum funktion that it's only going to look in where nameid = nameid...

I might have do do a new temporary df, where I store all the same nameId togheter, but again, this feels like it have to be some esier and cleaner way to do it.

This is the df

           Unnamed: 0          Name  ...                 Stop       FDP
0               0   Jadyn Estes  ...  2020-01-01 14:20:00  7.333333
1               1   Jadyn Estes  ...  2020-01-02 11:10:00  4.166667
2               2   Jadyn Estes  ...  2020-01-03 13:40:00  6.000000
3               3   Jadyn Estes  ...  2020-01-04 14:20:00  3.333333
4               4   Jadyn Estes  ...  2020-01-05 17:20:00  3.333333
...           ...           ...  ...                  ...       ...
14795       14795  Ellie Dawson  ...  2020-12-25 20:30:00  8.000000
14796       14796  Ellie Dawson  ...  2020-12-26 19:10:00  6.333333
14797       14797  Ellie Dawson  ...  2020-12-27 15:40:00  3.666667
14798       14798  Ellie Dawson  ...  2020-12-30 13:30:00  3.500000
14799       14799  Ellie Dawson  ...  2020-12-31 23:50:00  8.833333
# first convert strings to actual dates
df['Dates'] = pandas.to_datetime(df['Dates'])     
# sum the values between target dates
df[df['Dates'].between('2020-10-01','2020-10-14')]['Value'].sum()

should work fine...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM