简体   繁体   中英

pandas dropping columns based on column name

all

Let's say there is a df with some column names - in my case the names are numeric values. For example, column named 1000, 1001, etc. I need to drop everything that doesn't pass a certain filter test - in my case, all columns with names less than a certain value. Lets say less than 1500...

I know how to do this directly (by listing every column), or by calling drop in a loop, but it seems very inefficient. I'm having syntax difficulties expressing it..

I have tried something like this:

df.drop(df.columns[x for x in df.columns.values<str(1500)], axis=1))

or

df.drop(df.columns.values<str(1500)], axis=1)

but these are obviously wrong.

Please, advise! Thank you

I think the simpliest is create boolean mask and then select with loc:

df = pd.DataFrame(columns=range(10), index=[0]);
print (df)
     0    1    2    3    4    5    6    7    8    9
0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

#inverting boolean mask with ~
print (df.loc[:, ~(df.columns < 8)])
     8    9
0  NaN  NaN

print (df.columns >= 8)
[False False False False False False False False  True  True]

print (df.loc[:, df.columns >= 8])
     8    9
0  NaN  NaN

What is same as drop by filtered column names:

print (df.columns[df.columns < 8])
Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')


print (df.drop(df.columns[df.columns < 8], axis=1))

     8    9
0  NaN  NaN

Consider a dataframe with column names 0 to 99.

    0   1   2   3   4   5   6   7   8   9   ... 90  91  92  93  94  95  96  97  98  99
0   0   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0

If you want to drop the column names less 30,

df = df.drop((x for x in df.columns.tolist() if x < 30), axis = 1)

returns

    30  31  32  33  34  35  36  37  38  39  ... 90  91  92  93  94  95  96  97  98  99
0   0   0   0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0

If your columns are of the type object, convert them first using

df.columns = df.columns.astype(np.int64)

首先构建要删除的列列表(遍历列,查看它们是否符合条件),然后立即删除该列表中的所有列。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM