I have a code that I want to calculate the a sum or a average between two dates.
Example data that is in a Pandas DataFrame:
data = {'Dates':['2022-12-22','2021-01-01','2021-01-05','2021-01-07','2021-01-10','2021-01-13'],
'Value':['4','3','6','5','4','3']}
I want to loop trough all dates, not just them in the column, so from a start date to an end and then get the average or sum from the all the dates minus for example 14 days.
So I'm a wee bit stuck in how to best construct this and I want to figure out a clever and clen way for calculate. I have two thoughts, number one feels uglier and heavyer, but I'm not sure that method two is possible. Do you guys have any ideas? (The code I've written down there is not correct, just to illistrate)
Method one:
date_start = 2020-12-20
date_stop = today()
date_range = date_stop - date_start
for i in dates:
if between this date and - 14 in data then sum Values
Or maybe metod two:
date_start = 2020-12-20
date_stop = today()
date_range = date_stop - date_start
for i in dates:
df.Value.ewm(span=14, adjust=False).mean()
EDIT:
Brilliant, Cheers.that works like a charm.
This is what I came up with:
input_data = pd.read_csv('data/df_random_hours.csv')
df = pd.DataFrame(input_data)
input_data = pd.read_csv('data/names.csv')
df2 = pd.DataFrame(input_data)
days = 365
date_start = datetime(2020, 1, 1)
date_stop = datetime(2021, 1, 1)
#date, NameID, Name, FDPsum_14
date_diff = (date_stop - date_start)
date_diff = date_diff.days
df['Start'] = pd.to_datetime(df['Start'])
print(df)
x = 0
df_sum = df.groupby(['NameID'])['FDP'].sum()
print(df_sum)
for x in range(0, len(df_sum)):
for i in range(0, date_diff):
if df_sum['NameID'][x] =
date_now = (date_start + timedelta(days=i))
date_old = (date_now - timedelta(days=14))
sum = df[df['Start'].between(date_old,date_now)]['FDP'].sum()
My next challange is that it's also divided by name or nameid, looping like this is not going to work beause it's nothing telling the sum funktion that it's only going to look in where nameid = nameid...
I might have do do a new temporary df, where I store all the same nameId togheter, but again, this feels like it have to be some esier and cleaner way to do it.
This is the df
Unnamed: 0 Name ... Stop FDP
0 0 Jadyn Estes ... 2020-01-01 14:20:00 7.333333
1 1 Jadyn Estes ... 2020-01-02 11:10:00 4.166667
2 2 Jadyn Estes ... 2020-01-03 13:40:00 6.000000
3 3 Jadyn Estes ... 2020-01-04 14:20:00 3.333333
4 4 Jadyn Estes ... 2020-01-05 17:20:00 3.333333
... ... ... ... ... ...
14795 14795 Ellie Dawson ... 2020-12-25 20:30:00 8.000000
14796 14796 Ellie Dawson ... 2020-12-26 19:10:00 6.333333
14797 14797 Ellie Dawson ... 2020-12-27 15:40:00 3.666667
14798 14798 Ellie Dawson ... 2020-12-30 13:30:00 3.500000
14799 14799 Ellie Dawson ... 2020-12-31 23:50:00 8.833333
# first convert strings to actual dates
df['Dates'] = pandas.to_datetime(df['Dates'])
# sum the values between target dates
df[df['Dates'].between('2020-10-01','2020-10-14')]['Value'].sum()
should work fine...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.