[英]How to filter MultiIndex dataframe based on 1st level max values?
I have the following dataframe s
: 我有以下的数据帧s
:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
[1, 2, 1, 2, 1, 2, 3, 2,]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
first second
bar 1 -0.493897
2 -0.274826
baz 1 -0.337298
2 -0.564097
foo 1 -1.545826
2 0.159494
qux 3 -0.876819
2 0.780388
dtype: float64
I would like to convert it to: 我想将其转换为:
first second
bar 2 -0.274826
baz 2 -0.564097
foo 2 0.159494
qux 3 -0.876819
dtype: float64
By taking the max
second
of every first
. 通过采取max
second
每隔的first
。
I tried doing s.groupby(level=1).apply(max)
, but this returns: 我尝试过s.groupby(level=1).apply(max)
,但这会返回:
second
1 -0.337298
2 0.780388
dtype: float64
Clearly my attempt returns the max for each group in second
, instead of the max
second
for each first
. 显然,我试图返回每个组中的最大second
,而不是max
second
每个first
。
Any idea how to do this? 知道怎么做吗?
Use idxmax
and boolean indexing: 使用idxmax
和布尔索引:
s[s.groupby(level=0).idxmax()]
Output: 输出:
first second
bar 2 0.482328
baz 1 0.244788
foo 2 1.310233
qux 2 0.297813
dtype: float64
Using sort_values
+ tail
使用sort_values
+ tail
s.sort_values().groupby(level=0).tail(1)
Out[33]:
first second
bar 2 -1.806466
baz 2 -0.776890
foo 1 -0.641193
qux 2 -0.455319
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.