I'm trying to describe columns A, B by mean, median, 25th percentile, 75th percentile, standard deviation.
df = pd.DataFrame({'A':[1,9,3,4,6,8,2,7],
'B':[2,4,7,8,9,2,5,6],
'S':['L','L','L','S','L','S','S','L']})
Here is what I did and it worked since I only have 25th percentile:
df.pivot_table(columns = ['S'], values = ['A','B'], aggfunc = [np.mean, lambda x: np.percentile(x,25), np.median, np.std])
But if I also put the 75th percentile in, it gives me the error message:
Reindexing only valid with uniquely valued Index objects
Ideally I would like the output list the 75th percentile in the next columns.
This will do what I think you want, but without a lambda and few extra lines:
def my25(g):
return np.percentile(g, 25)
def my75(g):
return np.percentile(g, 75)
df.pivot_table(columns = ['S'], values = ['A','B'],
aggfunc = [np.mean, my25, np.median, np.std, my75])
mean my25 median std my75
S L S L S L S L S L S
A 5.2 4.67 3 3.0 6 4 3.19 3.06 7 6.0
B 5.6 5.00 4 3.5 6 5 2.70 3.00 7 6.5
EDIT: actually, it is possible to use only lambda functions if you use groupby to aggregate instead of pivot_table, and supply a name to each function.
func_lst = [('mean',np.mean), ('25',lambda x:np.percentile(x,0.25)),
('med',np.median), ('std',np.std), ('75',lambda x:np.percentile(x,0.75))]
df.groupby('S').agg(func_lst).stack(level=0).unstack(level=0).swaplevel(0,1,axis=1)
mean 25 med std 75
S L S L S L S L S L S
A 5.2 4.67 3 3.0 6 4 3.19 3.06 7 6.0
B 5.6 5.00 4 3.5 6 5 2.70 3.00 7 6.5
I thought using func_lst in a pivot_table call might work but it doesn't. Anyway to me it is clearer to just define the my25, my75 functions and use the pivot_table.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.