在多索引 pandas dataframe 中创建列名列表

Question

我从 excel 表中读取的 dataframe 中有一个曲折的列名列表。 数据作为多索引 dataframe 导入，具有两列 label 级别。 我想创建一个包含特定字符串的某些列名的列表，以便我可以从 dataframe 中删除它们。

我的想法是使用这样的东西：

# Create list of names for unwanted columns.
lst = [col for col in df.columns if 'ISTD' in col]
# Returns empty.

# Drop columns from dataframe.
df.drop(labels = lst, axis=1, level=0, inplace=True)

该列表返回空，所以我想问题是我不知道如何正确 select 多索引数据帧中的列。 我发现文档很难理解，所以我希望在这里得到答案。

以下是我的列名供参考：

df.columns
Out[44]: 
MultiIndex([('115  In ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('115  In ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '137  Ba  [ He Gas ] ',           'Conc. RSD'),
            (         '137  Ba  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '137  Ba  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ('159  Tb ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('159  Tb ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ('175  Lu ( ISTD )  [ He Gas ] ',                 'CPS'),
            ('175  Lu ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (         '208  Pb  [ He Gas ] ',           'Conc. RSD'),
            (         '208  Pb  [ He Gas ] ',       'Conc. [ ppb ]'),
            (         '208  Pb  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ',           'Conc. RSD'),
            (          '23  Na  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '23  Na  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ',           'Conc. RSD'),
            (          '24  Mg  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '24  Mg  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ',           'Conc. RSD'),
            (          '27  Al  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '27  Al  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ',           'Conc. RSD'),
            (           '39  K  [ He Gas ] ',       'Conc. [ ppb ]'),
            (           '39  K  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ',           'Conc. RSD'),
            (          '44  Ca  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '44  Ca  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '45  Sc ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '52  Cr  [ He Gas ] ',           'Conc. RSD'),
            (          '52  Cr  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '52  Cr  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ',           'Conc. RSD'),
            (          '55  Mn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '55  Mn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ',           'Conc. RSD'),
            (          '56  Fe  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '56  Fe  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ',           'Conc. RSD'),
            (          '60  Ni  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '60  Ni  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ',           'Conc. RSD'),
            (          '63  Cu  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '63  Cu  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ',           'Conc. RSD'),
            (          '66  Zn  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '66  Zn  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (  '7  Li ( ISTD )  [ He Gas ] ',                 'CPS'),
            (  '7  Li ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',                 'CPS'),
            ( '72  Ge ( ISTD )  [ He Gas ] ',             'CPS RSD'),
            (          '75  As  [ He Gas ] ',           'Conc. RSD'),
            (          '75  As  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '75  As  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '78  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '78  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ',           'Conc. RSD'),
            (          '82  Se  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '82  Se  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ',           'Conc. RSD'),
            (          '95  Mo  [ He Gas ] ',       'Conc. [ ppb ]'),
            (          '95  Mo  [ He Gas ] ', 'Meas. Conc. [ ppb ]'),
            (                       'Sample',      'Acq. Date-Time'),
            (                       'Sample',             'Comment'),
            (                       'Sample',           'Data File'),
            (                       'Sample',               'Level'),
            (                       'Sample',                'Rjct'),
            (                       'Sample',         'Sample Name'),
            (                       'Sample',          'Total Dil.'),
            (                       'Sample',                'Type'),
            (                       'Sample',  'Unnamed: 0_level_1'),
            (                       'Sample',         'Vial Number')]

谢谢阅读。

Answer 1

因此，对于多列， df.columns返回一个 object，您可以将其视为元组列表（类型为 MultiIndex.

您可以像这样遍历它们并删除它们：

cols = [(first, second) for first, second in df.columns if 'ISTD' in second]
df.drop(cols, axis=1, level=1)

这将仅在第二层（您从 df.columns 获得的元组的第二个值）中查找“ISTD”。

Answer 2

多索引列是一个元组列表。 你可以做：

lst = [col for col in df.columns if 'ISTD' in col[0]]
df = df.drop(lst, axis=1)

Answer 3

您无需创建列表，使用“usecols”读取文件时无法读取列

data = pd.read_excel(directory, usecols = lambda x: False if "unwanted_string" in x else True)

如果您仍然想制作一个列表，您可以单独获取 header 行，然后通过该列表获取 go 以消除带有不需要的字符串的行。

#Read in the column names as a list:
cols = pd.read_excel(directory, header=None, nrows=1, index_col = 0).values[0]
cols = cols.tolist()

#remove the elements that contain the unwanted string
for item in cols:
    if "string" in str(item):
        cols.remove(item)
    else:
        continue

#then assign cols list as columns of the dataframe:
data.columns = cols

Answer 4

这是另一种方式。 首先，创建一个包含 4 行的示例 MultiIndex（每行是一个元组）：

midx = pd.MultiIndex.from_tuples([
        ('115  In ( ISTD )  [ He Gas ] ',           'CPS'),
        ('115  In ( ISTD )  [ He Gas ] ',       'CPS RSD'),
        (         '137  Ba  [ He Gas ] ',     'Conc. RSD'),
        (         '137  Ba  [ He Gas ] ', 'Conc. [ ppb ]'),
])

现在，创建掩码（在多索引的第一部分寻找 ISTD）：

mask = np.array(['ISTD' in idx for idx in midx.get_level_values(0)])
midx[ ~ mask ]

MultiIndex([('137  Ba  [ He Gas ] ',     'Conc. RSD'),
            ('137  Ba  [ He Gas ] ', 'Conc. [ ppb ]')],
           )

在多索引 pandas dataframe 中创建列名列表

问题描述

4 个解决方案

解决方案1
1 已采纳 2020-07-31 19:50:47

解决方案2
1 2020-07-31 19:53:18

解决方案3
0 2020-07-31 19:49:08

解决方案4
0 2020-07-31 21:15:11

在多索引 pandas dataframe 中创建列名列表

问题描述

4 个解决方案

解决方案1 1 已采纳 2020-07-31 19:50:47

解决方案2 1 2020-07-31 19:53:18

解决方案3 0 2020-07-31 19:49:08

解决方案4 0 2020-07-31 21:15:11

解决方案1
1 已采纳 2020-07-31 19:50:47

解决方案2
1 2020-07-31 19:53:18

解决方案3
0 2020-07-31 19:49:08

解决方案4
0 2020-07-31 21:15:11