简体   繁体   中英

Filtering Pandas Dataframe Based on List of Column Names

I have a pandas dataframe which has may be 1000 Columns. However I do not need so many columns> I need columns only if they match/starts/contains specific strings.

So lets say I have a dataframe columns like df.columns =

  HYTY, ABNH, CDKL, GHY@UIKI,  BYUJI@#hy  BYUJI@tt  BBNNII#5  FGATAY@J ....

I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*

So what I was trying to do is to create a list of regular expressions like:

  import re 

  relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']


  my_w_lst = [re.escape(s) for s in relst]

  mask_pattrn = '|'.join(my_w_lst)

Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not. However, not understanding how to get the dataframe of only those true selected columns from this.

Any help will be appreciated.

Using what you already have you can pass your mask to filter like:

df.filter(regex=mask_pattrn)

We can do startswith

relst = ['CDKL', 'BYUJI', 'BBNI']

subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]

Use re.findall() . It will give you a list of columns to pass to df[mylist]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM