简体   繁体   English

使用级别获取多索引Pandas DataFrame的最小索引

[英]Get index of the minimum of multi-index Pandas DataFrame using level

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows. 我有一个熊猫数据帧multiindexed ,并希望找到在每个级别上的行的子集某列的最小值,并获得这些行的全部内容。

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
                                  ['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140],
                   [2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140]],
                  columns=['col1', 'col2'],
                  index=idx)

My structure: 我的结构:

                 col1  col2
level1 level2
v1     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140
v2     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140

Example desired output I want the minimum value of another column where col1 == 5 : 示例所需的输出我想要col1 == 5的另一列的最小值:

                 col1  col2
level1 level2
v1     record5    5.0   114
v2     record5    5.0   114

I know that I can get a subset of rows by using a comparison statement. 我知道我可以使用比较语句获取行的子集。

df.ix[df['col1'] == 5]

And I also know that I can get the minimum values of a column within that subset from all levels . 而且我也知道,我可以从各级该子集内获取某列的最小值

df['col2'][df['col1'] == 5].min(level='level1')

And if I want to specify the level, then I can get the index of 1 row on specific level . 如果我想指定级别,那么我可以在特定级别上获得1行的索引

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]

But I cannot figure out if there is an efficient way to get the indexes from all levels 但我无法弄清楚是否有一种有效的方法从各个层面获取索引

There does not seem to be a method available along the lines of this: 似乎没有一种方法可用于此:

df['col2'][df['col1'] == 5].idxmin(level='level1')

I can get to what I want with this: 我可以用这个得到我想要的东西:

df.ix[
  (df['col1'] == 5) & 
  (df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]

But with everything else that is in Pandas , is there a better way to get to my output? 但是对于Pandas其他一切,是否有更好的方法来获得我的输出?

This should work: 这应该工作:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note 注意

I'm using idxmin as you thought you ought to. 我正在使用你认为应该的idxmin But the context matters. 但背景很重要。 I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should. 我在groupby(level=0).col2.idxmin()它,它的行为与你认为的col2.idxmin(level=...)

>>> (df[df.col1 == 5]
     .groupby(level=0, as_index=False).col2
     .apply(lambda group: group.nsmallest(1))
0  v1  record5    114
1  v2  record5    114
dtype: int64

Or... 要么...

>>> df[df.col1 == 5].groupby(level=0).col2.nsmallest(1)
v1  v1  record5    114
v2  v2  record5    114
dtype: int64

But I'm not sure why the first level shows twice (ie 'v1' 'v1' ...). 但我不确定为什么第一级显示两次(即'v1''v1'......)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM