简体   繁体   English

如何根据第一级最大值过滤MultiIndex数据帧?

[英]How to filter MultiIndex dataframe based on 1st level max values?

I have the following dataframe s : 我有以下的数据帧s

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          [1, 2, 1, 2, 1, 2, 3, 2,]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)

first  second
bar    1        -0.493897
       2        -0.274826
baz    1        -0.337298
       2        -0.564097
foo    1        -1.545826
       2         0.159494
qux    3        -0.876819
       2         0.780388
dtype: float64

I would like to convert it to: 我想将其转换为:

first  second
bar    2        -0.274826
baz    2        -0.564097
foo    2         0.159494
qux    3        -0.876819
dtype: float64

By taking the max second of every first . 通过采取max second每隔的first

I tried doing s.groupby(level=1).apply(max) , but this returns: 我尝试过s.groupby(level=1).apply(max) ,但这会返回:

second
1   -0.337298
2    0.780388
dtype: float64

Clearly my attempt returns the max for each group in second , instead of the max second for each first . 显然,我试图返回每个组中的最大second ,而不是max second每个first

Any idea how to do this? 知道怎么做吗?

Use idxmax and boolean indexing: 使用idxmax和布尔索引:

s[s.groupby(level=0).idxmax()]

Output: 输出:

first  second
bar    2         0.482328
baz    1         0.244788
foo    2         1.310233
qux    2         0.297813
dtype: float64

Using sort_values + tail 使用sort_values + tail

s.sort_values().groupby(level=0).tail(1)
Out[33]: 
first  second
bar    2        -1.806466
baz    2        -0.776890
foo    1        -0.641193
qux    2        -0.455319
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM