I would like to create a pivot table and use to multiple aggfuncs
; specifically np.mean
and np.std
. The normal behavior of pandas.pivot_tables
would be to place the aggfuncs
on top hierarchically. If you only had one column in value
this wouldn't matter, but I have 7. This makes reading the table a little bit tedious. I would like to be able to place the values on top of the hierarchy so that within each value
there is a mean and std column. Is there a way that this can be done or am I sol?
Thanks for any help!
Here is a small excerpt of the data. 这是数据的一小部分摘录。
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A A 0 4247 5684 2068 393 237 16286
A A 0 0 2366 4159 3155 696 341
A B 18198 0 1114 1871 5392 1954 755
A B 17829 0 2695 2366 3768 1289 445
A C 18352 0 3545 7508 5099 2071 1239
I want the output to look like this:
Col 3 Col 4 Col 5 Col 6 Col 7 ...
Col 1 Col 2 Mean Std Mean Std Mean Std Mean Std Mean Std ...
A A 0 0 2123.5 2173.5 ...
B ...
C ...
I'm not going to run through all the calcs right now but I think that gets the point across as this is a formatting question.
I had to change your row headings from "Col 1" >> "Col1"
import pandas as pd
df=pd.read_clipboard()
df
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A A 0 4247 5684 2068 393 237 16286
A A 0 0 2366 4159 3155 696 341
A B 18198 0 1114 1871 5392 1954 755
A B 17829 0 2695 2366 3768 1289 445
A C 18352 0 3545 7508 5099 2071 1239
In [9]: import numpy as np
In [10]: np.round(df.groupby(['Col1', 'Col2']).agg(['mean', 'std']),4)
Out[10]:
Col3 Col4 Col5 Col6 \
mean std mean std mean std mean
Col1 Col2
A A 0.0 0.0000 2123.5 3003.0825 4025.0 2346.1803 3113.5
B 18013.5 260.9224 0.0 0.0000 1904.5 1117.9358 2118.5
C 18352.0 NaN 0.0 NaN 3545.0 NaN 7508.0
Col7 Col8 Col9
std mean std mean std mean std
Col1 Col2
A A 1478.5603 1774 1953.0289 466.5 324.562 8313.5 11274.8176
B 350.0179 4580 1148.3414 1621.5 470.226 600.0 219.2031
C NaN 5099 NaN 2071.0 NaN 1239.0 NaN
I think this can be solved using a combo of .describe() (which has both mean and std stats) and .pivot. Say you have a DF that looks like this:
print(df)
City Country lon
0 Dubai United Arab Emirates 55.307484
254 Buenos Aires Argentina -58.381592
1002 Rosario Argentina -60.666500
1162 Punta Arenas Chile -70.916473
1178 San Miguel Argentina -65.217590
and you want to get the statistics on the 'lon' column for each country. Use .describe to get the statistics:
stats = df.groupby('Country').describe()
#reset index so that you can specify the columns later
stats.reset_index(level = [0,1], inplace = True)
stats.head()
Country level_1 lon
0 Albania count 1.0000
1 Albania mean 19.8318
2 Albania std NaN
3 Albania min 19.8318
4 Albania 25% 19.8318
Then do a pivot table based on your stats table. Since the result will be multi-indexed, you need to specify tuple columns to access the mean and std columns:
stats.pivot('Country', 'level_1')[[('lon', 'mean'), ('lon', 'std')]]
the result will be something like:
lon
level_1 mean std
Country
Albania 19.831800 NaN
Algeria 2.744837 3.323134
Angola 13.234444 NaN
Argentina -63.806806 4.101027
Let me know if this helps--good luck.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.