I have a pandas data frame like this:
Account Id Gross Sum Invoice Type Name Net Sum Company Security Supplier Date Completed YearMonth Category
710830 282.81 Invoice 282.81 asd5a Abc 1/1/2018 2018-1 Postal
445800 4868.71 Invoice 3926.4 adc6ac Def 1/1/2018 2018-1 R&D
710350 282.81 Invoice 282.81 fgn6 Ghi 2/9/2018 2018-2 Other
710510 282.81 Invoice 282.81 dg jkl 2/9/2018 2018-2 Electricity
710630 841.59 Invoice 707.07 dfvbfbf mno 3/2/2018 2018-3 Repairs
710610 841.59 Invoice 707.07 rrcv pqr 3/2/2018 2018-3 Leasing
710810 12.14 Invoice 10.12 btbfd stu 1/1/2019 2019-1 Telephone
704300 81517.6 Invoice 65740 dfbtt vwx 1/1/2019 2019-1 Statutory
710510 2105.64 Invoice 1776.53 dfdftb5 dfb 2/9/2019 2019-2 Electricity
710510 2105.64 Invoice 1776.53 ebdfb5b bcd 2/9/2019 2019-2 Electricity
710920 66.96 Invoice 54 dfrrt65 efg 3/2/2019 2019-3 Data
700330 239.47 Invoice 239.47 aae3a11 hij 3/2/2019 2019-3 Coffee
What i want is to add rows at the bottom of the data frame that calculates the average of same month last 3 years.
For example : For year month 2020-1
the calculation should be for 2020-1 = sum(Net Sum Company) In 2019-1 + sum(Net Sum Company) in 2018-1 + sum(Net Sum Company) In 2017-1
divided by the number of months considered
ie 3 , so only last three years has to be considered. That way i'll get the average and append the same as new row at the bottom that has nothing but the Year Month and average of net sum company column. The end goal is to get a data frame like this:
Account Id Gross Sum Invoice Type Name Net Sum Company Security Supplier Date Completed YearMonth Category
710830 282.81 Invoice 282.81 asd5a Abc 1/1/2018 2018-1 Postal
445800 4868.71 Invoice 3926.4 adc6ac Def 1/1/2018 2018-1 R&D
710350 282.81 Invoice 282.81 fgn6 Ghi 2/9/2018 2018-2 Other
710510 282.81 Invoice 282.81 dg jkl 2/9/2018 2018-2 Electricity
710630 841.59 Invoice 707.07 dfvbfbf mno 3/2/2018 2018-3 Repairs
710610 841.59 Invoice 707.07 rrcv pqr 3/2/2018 2018-3 Leasing
710810 12.14 Invoice 10.12 btbfd stu 1/1/2019 2019-1 Telephone
704300 81517.6 Invoice 65740 dfbtt vwx 1/1/2019 2019-1 Statutory
710510 2105.64 Invoice 1776.53 dfdftb5 dfb 2/9/2019 2019-2 Electricity
710510 2105.64 Invoice 1776.53 ebdfb5b bcd 2/9/2019 2019-2 Electricity
710920 66.96 Invoice 54 dfrrt65 efg 3/2/2019 2019-3 Data
700330 239.47 Invoice 239.47 aae3a11 hij 3/2/2019 2019-3 Coffee
- - - 34979.66 - - - 2020-1 -
- - - 2059.34 - - - 2020-2 -
- - - 853.805 - - - 2020-3 -
I am new to pandas so any guidance is appreciated. This has to be strictly done using pandas only.
For a simple 3y rolling average, do something like this:
df1['Date Completed'] = pd.to_datetime(df1['Date Completed'])
df1['roll_3y_avg'] = df1.rolling(window='1096D', on='Date Completed', closed='right')['Net Sum Company'].mean()
IIUC, you want to:
Net Sum Company
column over the 3 previous yearsYearMonth
columnCode could be:
# extract Year and Month Series from the dataframe
year = df['YearMonth'].str.slice(stop=4).astype(int)
month = df['YearMonth'].str.slice(start=5)
# compute the new year per month as max(year) + 1
newyear_month = year.groupby(month).max() + 1
# build a Series aligned with the dataframe from that new year
newyear = pd.DataFrame(month).merge(
pd.DataFrame(newyear_month),
left_on='YearMonth', right_index=True, suffixes=('_x', '')
)['YearMonth'].sort_index()
# compute the sum of relevant years per month
tmp = df.loc[(newyear-3 <= year) & (year <= newyear-1), 'Net Sum Company'
].groupby(month).sum()
# divide by the number of distinct month per sum
tmp /= df.groupby(month)['YearMonth'].nunique()
# compute a YearMonth column for that new dataframe
tmp = pd.concat([newyear_month.astype(str), tmp], axis=1)
tmp['YearMonth'] = tmp['YearMonth'] + '-' + tmp.index # tmp is indexed by month
# force the type of Account Id to object to allow it to contain null values
df['Account Id'] = df['Account Id'].astype(object)
# concat the new rows to the dataframe and reset the index
new_df = df.append(tmp, sort=False).reset_index(drop=True)
With your sample, new_df
gives:
Account Id Gross Sum Invoice Type Name Net Sum Company Security Supplier Date Completed YearMonth Category
0 710830 282.81 Invoice 282.810 asd5a Abc 1/1/2018 2018-1 Postal
1 445800 4868.71 Invoice 3926.400 adc6ac Def 1/1/2018 2018-1 R&D
2 710350 282.81 Invoice 282.810 fgn6 Ghi 2/9/2018 2018-2 Other
3 710510 282.81 Invoice 282.810 dg jkl 2/9/2018 2018-2 Electricity
4 710630 841.59 Invoice 707.070 dfvbfbf mno 3/2/2018 2018-3 Repairs
5 710610 841.59 Invoice 707.070 rrcv pqr 3/2/2018 2018-3 Leasing
6 710810 12.14 Invoice 10.120 btbfd stu 1/1/2019 2019-1 Telephone
7 704300 81517.60 Invoice 65740.000 dfbtt vwx 1/1/2019 2019-1 Statutory
8 710510 2105.64 Invoice 1776.530 dfdftb5 dfb 2/9/2019 2019-2 Electricity
9 710510 2105.64 Invoice 1776.530 ebdfb5b bcd 2/9/2019 2019-2 Electricity
10 710920 66.96 Invoice 54.000 dfrrt65 efg 3/2/2019 2019-3 Data
11 700330 239.47 Invoice 239.470 aae3a11 hij 3/2/2019 2019-3 Coffee
12 NaN NaN NaN 34979.665 NaN NaN NaN 2020-1 NaN
13 NaN NaN NaN 2059.340 NaN NaN NaN 2020-2 NaN
14 NaN NaN NaN 853.805 NaN NaN NaN 2020-3 NaN
Remarks:
new_df = new_df.fillna('')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.