简体   繁体   中英

extract the subset of a pandas multi index data frame

>new_dat=dat_corr.merge(dat_class,on="Asset",how="right").set_index(['Country','‌​Class','Asset'])
>new_dat.shape
(89, 89) 
>temp1='UNITEDSTATES' 
>temp2='Equity'
>new_dat.loc[ (new_dat.index.get_level_values('Country').isin([temp1]) & new_dat.index.get_level_values('Class').isin([temp2]))]'
>new_dat.columns=new_dat.index

The last line gives me [3 rows x 89 columns]. My 89 columns is a mix of Equity/FX/FI/Commodities. If i want only USA Equities vs all other equity and not the enitre 89 columns how do i do it? I have also added multi index for the columns. The question is now that i have multi index along both rows and columns how do i use that for filtering The below is a small subset of the data:

Country UNITEDSTATES CANADA \\ Class Equity Equity
Asset DJ1Index SP1Index ND1Index PT1Index
Country Class Asset
UNITEDSTATES Equity DJ1Index 1.000000 0.958038 0.747192 0.648373
SP1Index 0.958038 1.000000 0.825458 0.717545
ND1Index 0.747192 0.825458 1.000000 0.612487
CANADA Equity PT1Index 0.648373 0.717545 0.612487 1.000000
MEXICO Equity IS1Index 0.622570 0.664499 0.565702 0.575618

Country MEXICO BRAZIL GERMANY BRITAIN \\ Class Equity Equity Equity Equity
Asset IS1Index BZ1Index VG1Index Z1Index
Country Class Asset
UNITEDSTATES Equity DJ1Index 0.622570 0.523704 0.566993 0.520526
SP1Index 0.664499 0.565941 0.587933 0.539138
ND1Index 0.565702 0.484441 0.458135 0.391391
CANADA Equity PT1Index 0.575618 0.526663 0.499343 0.493260
MEXICO Equity IS1Index 1.000000 0.577041 0.502558 0.487487

You can add your column(s) to the .loc method after comma like this:

df.loc[(cond1) & (cond2), 'column_name']

This will output your df filtered by your conditions with only one column column_name .

You can have multiple columns if you put them in the list:

df.loc[(cond1) & (cond2), ['column_name1', 'column_name2']]

You can see the docs for more details.

EDIT:

In case your columns are also MultiIndex you could use IndexSlice for that:

import pandas as pd
idx = pd.IndexSlice
df.loc[(cond1) & (cond2), idx[:,'column_name']]

Note that idx[:,'column_name'] should be adjusted to your MultiIndex setup. Ie you need to have : or column_name (s) for every level of MultiIndex .

You might find useful examples of how to use this in the MultiIndex docs . Worth noting the warnings on the need to have your Index lexsorted and you pandas version should be 0.14.+.

If you would put a reproducible example of the dataframe it would be easier to give a more concrete answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM