简体   繁体   中英

Conditional Slicing from Columns in Pandas MultiIndex

I am trying to conditionally slice data from a multiindex based on column names as opposed to index. For example, I have the following MultiIndex Data frame:

   203        204         205
  TIME VALUE TIME VALUE  TIME VALUE
0    1   bar  1.0   LH2  10.0   dog
1    2   baz  2.0   LOX  11.0   cat
2    3   foo  3.0   CH4  12.0   pig
3    4   qux  NaN   NaN  13.0   rat
4    5   qaz  NaN   NaN   NaN   NaN
5    6   qoo  NaN   NaN   NaN   NaN

(I essentially have measurement data (203, 204, etc) with a time and value, recorded using different sample rates. Thus, the number of rows will always be different. I am putting all data into a single MultiIndex since it can contain a varying number of rows.)

I want to select all data if TIME is > 3. The expected output would be the following:

   203        204         205
  TIME VALUE TIME VALUE  TIME VALUE
0    4   qux  NaN   NaN  10.0   dog
1    5   qaz  NaN   NaN  11.0   cat
2    6   qoo  NaN   NaN  12.0   pig
3   NaN  NaN  NaN   NaN  13.0   rat
4   NaN  NaN  NaN   NaN   NaN   NaN
5   NaN  NaN  NaN   NaN   NaN   NaN

I tried using the query method but that only works on an index, not a column name. I do not want to transpose the dataframe to use query. I also tried using loc but never seemed to find a way to get what I am looking for. I even looked into using xs but I don't think I can add conditional slicing with it.

I found this on SO but it doesn't include conditional slicing: Selecting columns from pandas MultiIndex

Here is the code that I have been using to test this:

import pandas as pd
import numpy as np

d1 = {'TIME': [1,2,3,4,5,6], 'VALUE': ['bar', 'baz', 'foo', 'qux', 'qaz', 'qoo']}
df1 = pd.DataFrame(data=d1)

d2 = {'TIME': [1,2,3], 'VALUE': ['LH2', 'LOX', 'CH4']}
df2 = pd.DataFrame(data=d2)

d3 = {'TIME': [10,11,12,13], 'VALUE': ['dog', 'cat', 'pig', 'rat']}
df3 = pd.DataFrame(data=d3)

df_list = [df1, df2, df3] 

pids = [203, 204, 205]

df_multi = pd.concat(df_list, axis=1, keys=list(zip(pids)))

print(df_multi)

# Slice all time columns
ALL = slice(None)
df_multi_2 = df_multi.loc[ALL, (ALL, 'TIME')]
print(df_multi_2)

# Condition based slicing - does not work
ALL = slice(None)
df_multi_3 = df_multi.loc[ALL, df_multi.loc[ALL,(ALL,'TIME')] > 3]
print(df_multi_3)

Let's try IndexSlice to slice the data:

from pandas import IndexSlice

mask = (df_multi.loc[:, IndexSlice[:,"TIME"]].gt(3)
    .reindex(df_multi.columns, axis=1)
    .groupby(level=0, axis=1)
    .transform('any')
)

df_multi.where(mask)

Output:

   203        204         205      
  TIME VALUE TIME VALUE  TIME VALUE
0  NaN   NaN  NaN   NaN  10.0   dog
1  NaN   NaN  NaN   NaN  11.0   cat
2  NaN   NaN  NaN   NaN  12.0   pig
3  4.0   qux  NaN   NaN  13.0   rat
4  5.0   qaz  NaN   NaN   NaN   NaN
5  6.0   qoo  NaN   NaN   NaN   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM