So I need to group rows by 'fh_status' column, and then perform min, mean and max of 'gini' for each group (there will be three). I came up with this code:
m = (df2.groupby(['fh_status']).max().iloc[:, 2]) #iloc2 corresponds to gini column
n = (df2.groupby(['fh_status']).min().iloc[:, 2])
e = (df2.groupby(['fh_status']).mean().iloc[:, 2])
nl = '\n'
print(f' mean: {e} {nl} maximum: {m} {nl} minimum:{n}')
output:
mean: fh_status
free 38.170175
not free 39.750000
partly free 43.931250
Name: gini, dtype: float64
maximum: fh_status
free 10.0
not free 5.0
partly free 9.0
Name: polity09, dtype: float64
minimum:fh_status
free 6.0
not free -10.0
partly free -6.0
Name: polity09, dtype: float64
Using these three methods in one string didn't work (AFAIK it prints only the latter command), so three variables came up and they're a bit clumsy. Output seems right, but I'm pretty sure there is a way to optimise this and reduce amount of code. Or isn't it?
Yes, you can use .agg(..)
and pass a list of operations:
df2.groupby('fh_status')['gini']
This will produce a dataframe with as columns the aggregates ( min
, max
, mean
), and as rows the groups (the values over which you made a .groupby(..)
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.