I have a multiindex dataframe with two layers of indices and roughly 100 columns. I would like to get groups of values (organized in columns) based on the presence of a certain value, but I am still struggling with the indexing mechanics.
Here is some example data:
import pandas as pd
index_arrays = [np.array(["one"]*5+["two"]*5),
np.array(["aaa","bbb","ccc","ddd","eee"]*2)]
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],
[10,11,12],[13,14,15],[16,1,17],
[18,19,20],[21,22,23],[24,25,26],
[27,28,29]], index=index_arrays)
Gives
0 1 2
one aaa 1 2 3
bbb 4 5 6
ccc 7 8 9
ddd 10 11 12
eee 13 14 15
two aaa 16 1 17
bbb 18 19 20
ccc 21 22 23
ddd 24 25 26
eee 27 28 29
Now, for each level_0 index ( one
and two
), I want to return the entire column in which the level_1 index of aaa
equals to a certain value, for example 1. What I got so far is this:
df[df.loc[(slice(None), "aaa"),:]==1].any(axis=1)
>
one aaa True
bbb False
ccc False
ddd False
eee False
two aaa True
bbb False
ccc False
ddd False
eee False
Instead of the boolean values, I would like to retrieve the actual values. The expected output would be:
expected:
0
one aaa 1
bbb 4
ccc 7
ddd 10
eee 13
two aaa 1
bbb 19
ccc 22
ddd 25
eee 28
I would appreciate your help.
Bonus question : Additionally, it would be great to know which column contains the values in question. For the example above, this would be column 0
(for index one
)and column 1
(for index two
). Is there a way to do this? Thanks!
Let's try with DataFrame.xs
:
m = df.xs('aaa', level=1).eq(1).any()
Or with pd.IndexSlice
:
m = df.loc[pd.IndexSlice[:, 'aaa'], :].eq(1).any()
Result:
df.loc[:, m]
0 1
one aaa 1 2
bbb 4 5
ccc 7 8
ddd 10 11
eee 13 14
two aaa 16 1
bbb 18 19
ccc 21 22
ddd 24 25
eee 27 28
df.columns[m]
Int64Index([0, 1], dtype='int64')
This might be what you're looking for:
df.loc[df.index.get_level_values(0) == 'one', df.loc[('one', 'aaa')] == 1]
This outputs:
0
one aaa 1
bbb 4
ccc 7
ddd 10
eee 13
To combine the results for all of the different values of the first index, generate these DataFrames and concatenate them:
output_df = pd.DataFrame()
for level_0_val in df.index.get_level_values(0).unique():
_ = df.loc[df.index.get_level_values(0) == level_0_val, df.loc[(level_0_val, 'aaa')] == 1]
output_df = output_df.append(_)
Here is output_df:
0 1
one aaa 1.0 NaN
bbb 4.0 NaN
ccc 7.0 NaN
ddd 10.0 NaN
eee 13.0 NaN
two aaa NaN 1.0
bbb NaN 19.0
ccc NaN 22.0
ddd NaN 25.0
eee NaN 28.0
You can then generate your desired output from this.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.