简体   繁体   中英

Create list of column names in multi-index pandas dataframe

I have a tortuous list of column names in a dataframe that I'm reading from an excel sheet. The data is being imported as a multi-indexed dataframe, with two column label levels. I would like to create a list of certain column names that contain a specific string so that I can drop them from the dataframe.

My thought was to use something like this:

# Create list of names for unwanted columns.
lst = [col for col in df.columns if 'ISTD' in col]
# Returns empty.

# Drop columns from dataframe.
df.drop(labels = lst, axis=1, level=0, inplace=True)

The list returns empty though, so I guess the issue is that I don't know how to properly select columns in multi-indexed dataframes. I'm finding it the documentation difficult to understand, so I'm hoping for answers here.

Here are what my column names look like for reference:

df.columns
Out[44]: 
MultiIndex([('115  In ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('115  In ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '137  Ba  [ He Gas ] ',           'Conc. RSD'),
            (         '137  Ba  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '137  Ba  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ('159  Tb ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('159  Tb ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ('175  Lu ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('175  Lu ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '208  Pb  [ He Gas ] ',           'Conc. RSD'),
            (         '208  Pb  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '208  Pb  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ',           'Conc. RSD'),
            (          '23  Na  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ',           'Conc. RSD'),
            (          '24  Mg  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ',           'Conc. RSD'),
            (          '27  Al  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ',           'Conc. RSD'),
            (           '39  K  [ He Gas ] ',       'Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ',           'Conc. RSD'),
            (          '44  Ca  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '52  Cr  [ He Gas ] ',           'Conc. RSD'),
            (          '52  Cr  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '52  Cr  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ',           'Conc. RSD'),
            (          '55  Mn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ',           'Conc. RSD'),
            (          '56  Fe  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ',           'Conc. RSD'),
            (          '60  Ni  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ',           'Conc. RSD'),
            (          '63  Cu  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ',           'Conc. RSD'),
            (          '66  Zn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (  '7  Li ( ISTD )  [ He Gas ] ',                 'CPS'),
            (  '7  Li ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '75  As  [ He Gas ] ',           'Conc. RSD'),
            (          '75  As  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '75  As  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '78  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '82  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ',           'Conc. RSD'),
            (          '95  Mo  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (                       'Sample',      'Acq. Date-Time'),
            (                       'Sample',             'Comment'),
            (                       'Sample',           'Data File'),
            (                       'Sample',               'Level'),
            (                       'Sample',                'Rjct'),
            (                       'Sample',         'Sample Name'),
            (                       'Sample',          'Total Dil.'),
            (                       'Sample',                'Type'),
            (                       'Sample',  'Unnamed: 0_level_1'),
            (                       'Sample',         'Vial Number')]

Thanks for reading.

So, in case of multicolumns, df.columns returns an object that you can think of as a list of tuples (of type MultiIndex.

You can iterate over them and delete them like this:

cols = [(first, second) for first, second in df.columns if 'ISTD' in second]
df.drop(cols, axis=1, level=1)

This will look for "ISTD" only in the second layer (the second value of the tuples you get from df.columns).

Multi-index columns are a list of tuples. You can do:

lst = [col for col in df.columns if 'ISTD' in col[0]]
df = df.drop(lst, axis=1)

You don't need to create a list, you can not read the columns while reading the file using "usecols"

data = pd.read_excel(directory, usecols = lambda x: False if "unwanted_string" in x else True)

If you still want to make a list, you can get the header row separately, then go through that list to eliminate ones with the unwanted string.

#Read in the column names as a list:
cols = pd.read_excel(directory, header=None, nrows=1, index_col = 0).values[0]
cols = cols.tolist()

#remove the elements that contain the unwanted string
for item in cols:
    if "string" in str(item):
        cols.remove(item)
    else:
        continue

#then assign cols list as columns of the dataframe:
data.columns = cols

Here is yet another way. First, create a sample MultiIndex with 4 rows (each row is a tuple):

midx = pd.MultiIndex.from_tuples([
        ('115  In ( ISTD )  [ He Gas ] ',           'CPS'),
        ('115  In ( ISTD )  [ He Gas ] ',       'CPS RSD'),
        (         '137  Ba  [ He Gas ] ',     'Conc. RSD'),
        (         '137  Ba  [ He Gas ] ', 'Conc. [ ppb ]'),
])

Now, create the mask (looking for ISTD in the first part of the multi index):

mask = np.array(['ISTD' in idx for idx in midx.get_level_values(0)])
midx[ ~ mask ]

MultiIndex([('137  Ba  [ He Gas ] ',     'Conc. RSD'),
            ('137  Ba  [ He Gas ] ', 'Conc. [ ppb ]')],
           )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM