Say I have a multi-index dataframe like the following:
A B C
X Y
bar one -0.007381 -0.365315 -0.024817
two -1.219794 0.370955 -0.795125
baz one 0.145578 1.428502 -0.408384
two -0.249321 -0.292967 -1.849202
three -0.249321 -0.292967 -1.849202
four 0.21 -0.967123 1.202234
foo one -1.046479 -1.250595 0.781722
two 1.314373 0.333150 0.133331
qux one 0.716789 0.616471 -0.298493
two 0.385795 -0.915417 -1.367644
I would like to get the maximum value of A
for each value of the first level ( X
), and collect the second level index when this happens.
How can I do this in Pandas?
In [87]: df.loc[df['A'].groupby(level='X').idxmax(), 'A']
Out[87]:
X Y
bar one -0.007381
baz four 0.210000
foo two 1.314373
qux one 0.716789
Name: A, dtype: float64
To find the median values , you could use
df['A'].groupby(level='X').median()
but it is less clear which row should be associated with the median, since if there is an even number of rows in a group, the average of the middle rows is used to compute the median. The median is thus not associated with one row, but rather two.
If you make an arbitrary decision, such as wanting the n//2
th row (rather than the (n-1)//2
th row), then you could use
grouped = df['A'].groupby(level='X', sort=True)
df.loc[grouped.apply(lambda grp: grp.index[grp.count()//2]), 'A']
to find both the median value and an "associated" row.
For example,
In [93]: df.loc[grouped.apply(lambda grp: grp.index[grp.count()//2]), 'A']
Out[93]:
X Y
bar two -1.219794
baz three -0.249321
foo two 1.314373
qux two 0.385795
Name: A, dtype: float64
Use a groupby
object:
groups = df['A'].groupby(level='X')
groups.min()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.