简体   繁体   中英

Python Pandas Pivot Table Calculations

I am trying to figure out how to calculate the mean values for each row in this Python Pandas Pivot table that I have created.

I also want to add the sum of each year at the bottom of the pivot table.

The last step I want to do is to take the average value for each month calculated above and divide it with the total average in order to get the average distribution per year.

import pandas as pd 
import pandas_datareader.data as web
import datetime

start = datetime.datetime(2011, 1, 1)
end = datetime.datetime(2017, 12, 31)

libor = web.DataReader('USD1MTD156N', 'fred', start, end) # Reading the data
libor = libor.dropna(axis=0, how= 'any') # Dropping the NAN values
libor = libor.resample('M').mean() # Calculating the mean value per date
libor['Month'] = pd.DatetimeIndex(libor.index).month # Adding month value after each 
libor['Year'] = pd.DatetimeIndex(libor.index).year # Adding month value after each 

pivot = libor.pivot(index='Month',columns='Year',values='USD1MTD156N')
print pivot

Any suggestions how to proceed? Thank you in advance

I think this is what you want (This is on python3 - I think only the print command is different in this script):

# Mean of each row
ave_month = pivot.mean(1)
#sum of each year at the bottom of the pivot table.
sum_year = pivot.sum(0)
# average distribution per year.
ave_year = sum_year/sum_year.mean()
print(ave_month, '\n', sum_year, '\n', ave_year)
Month
1     0.324729
2     0.321348
3     0.342014
4     0.345907
5     0.345993
6     0.369418
7     0.382524
8     0.389976
9     0.392838
10    0.392425
11    0.406292
12    0.482017
dtype: float64 
 Year
2011     2.792864
2012     2.835645
2013     2.261839
2014     1.860015
2015     2.407864
2016     5.953718
2017    13.356432
dtype: float64 
 Year
2011    0.621260
2012    0.630777
2013    0.503136
2014    0.413752
2015    0.535619
2016    1.324378
2017    2.971079
dtype: float64

I would use pivot_table over pivot, and then use the aggfunc parameter.

pivot = libor.pivot(index='Month',columns='Year',values='USD1MTD156N')

would be

import numpy as np
pivot = libor.pivot_table(index='Month',columns='Year',values='USD1MTD156N', aggfunc=np.mean)

YOu should be able to drop the resample statement also if I'm not mistaken

A link ot the docs:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM