简体   繁体   中英

DataFrame MultiIndex - find column by value

I have a multiindex dataframe with two layers of indices and roughly 100 columns. I would like to get groups of values (organized in columns) based on the presence of a certain value, but I am still struggling with the indexing mechanics.

Here is some example data:

import pandas as pd

index_arrays = [np.array(["one"]*5+["two"]*5), 
                np.array(["aaa","bbb","ccc","ddd","eee"]*2)]

df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],
                   [10,11,12],[13,14,15],[16,1,17],
                   [18,19,20],[21,22,23],[24,25,26],
                   [27,28,29]], index=index_arrays)

Gives

          0   1   2
one aaa   1   2   3
    bbb   4   5   6
    ccc   7   8   9
    ddd  10  11  12
    eee  13  14  15
two aaa  16   1  17
    bbb  18  19  20
    ccc  21  22  23
    ddd  24  25  26
    eee  27  28  29

Now, for each level_0 index ( one and two ), I want to return the entire column in which the level_1 index of aaa equals to a certain value, for example 1. What I got so far is this:

df[df.loc[(slice(None), "aaa"),:]==1].any(axis=1)
>
one  aaa     True
     bbb    False
     ccc    False
     ddd    False
     eee    False
two  aaa     True
     bbb    False
     ccc    False
     ddd    False
     eee    False

Instead of the boolean values, I would like to retrieve the actual values. The expected output would be:

expected:
          0
one aaa   1
    bbb   4
    ccc   7
    ddd  10
    eee  13
two aaa   1
    bbb  19
    ccc  22
    ddd  25
    eee  28

I would appreciate your help.

Bonus question : Additionally, it would be great to know which column contains the values in question. For the example above, this would be column 0 (for index one )and column 1 (for index two ). Is there a way to do this? Thanks!

Let's try with DataFrame.xs :

m = df.xs('aaa', level=1).eq(1).any()

Or with pd.IndexSlice :

m = df.loc[pd.IndexSlice[:, 'aaa'], :].eq(1).any()

Result:

df.loc[:, m]

          0   1
one aaa   1   2
    bbb   4   5
    ccc   7   8
    ddd  10  11
    eee  13  14
two aaa  16   1
    bbb  18  19
    ccc  21  22
    ddd  24  25
    eee  27  28

df.columns[m]

Int64Index([0, 1], dtype='int64')

This might be what you're looking for:

df.loc[df.index.get_level_values(0) == 'one', df.loc[('one', 'aaa')] == 1]

This outputs:

          0
one aaa   1
    bbb   4
    ccc   7
    ddd  10
    eee  13

To combine the results for all of the different values of the first index, generate these DataFrames and concatenate them:

output_df = pd.DataFrame()
for level_0_val in df.index.get_level_values(0).unique():
    _ = df.loc[df.index.get_level_values(0) == level_0_val, df.loc[(level_0_val, 'aaa')] == 1]
    output_df = output_df.append(_)

Here is output_df:

            0     1
one aaa   1.0   NaN
    bbb   4.0   NaN
    ccc   7.0   NaN
    ddd  10.0   NaN
    eee  13.0   NaN
two aaa   NaN   1.0
    bbb   NaN  19.0
    ccc   NaN  22.0
    ddd   NaN  25.0
    eee   NaN  28.0

You can then generate your desired output from this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM