简体   繁体   中英

Randomly sample levels in a MultiIndexed DataFrame

Hi I have a multi index dataframe like below and I want to randomly select part of this dataframe according to ID_1:

Below is my Dataframe

ID_1 ID_2 feature_1 feature_2 
  1    1      0        0
       2      1        1 
  2    1      1        1 
       2      0        1    
  3    1      1        1 
       2      0        1  
  4    1      1        1 
       2      0        1  

and I want to select 2 of ID_1's out of 4. Example result:

ID_1 ID_2 feature_1 feature_2 
  2    1      1        1 
       2      0        1    
  4    1      1        1 
       2      0        1  

What is the best way to do this. Thank you.

Use np.random.choice and select 2 levels at random from df.index.levels[0] . You can then use the selected levels to index into df using df.loc .

df
           feature_1  feature_2
ID_1 ID_2                      
1    1             0          0
     2             1          1
2    1             1          1
     2             0          1
3    1             1          1
     2             0          1
4    1             1          1
     2             0          1

# np.random.seed(0)  # Uncomment to make results reproducible.
df.loc[np.random.choice(df.index.levels[0], 2, replace=False)]

           feature_1  feature_2
ID_1 ID_2                      
3    1             1          1
     2             0          1
4    1             1          1
     2             0          1

If you need to do the same thing for the first level, use pd.IndexSlice for slicing on the first level.

v = np.random.choice(df.index.levels[1], 2, replace=False)
df.loc[pd.IndexSlice[:, v], :]
# df.loc(axis=0)[pd.IndexSlice[:, v]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM