Filtering Pandas Dataframe Based on List of Column Names

Question

I have a pandas dataframe which has may be 1000 Columns. However I do not need so many columns> I need columns only if they match/starts/contains specific strings.

So lets say I have a dataframe columns like df.columns =

  HYTY, ABNH, CDKL, GHY@UIKI,  BYUJI@#hy  BYUJI@tt  BBNNII#5  FGATAY@J ....

I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*

So what I was trying to do is to create a list of regular expressions like:

  import re 

  relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']


  my_w_lst = [re.escape(s) for s in relst]

  mask_pattrn = '|'.join(my_w_lst)

Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not. However, not understanding how to get the dataframe of only those true selected columns from this.

Any help will be appreciated.

Answer 1

Using what you already have you can pass your mask to filter like:

df.filter(regex=mask_pattrn)

Answer 2

We can do startswith

relst = ['CDKL', 'BYUJI', 'BBNI']

subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]

Answer 3

Use re.findall() . It will give you a list of columns to pass to df[mylist]

Filtering Pandas Dataframe Based on List of Column Names

Question

3 answers

solution1
2 ACCPTED 2020-08-12 00:11:19

solution2
1 2020-08-12 00:09:01

solution3
1 2020-08-12 00:10:46

Filtering Pandas Dataframe Based on List of Column Names

Question

3 answers

solution1 2 ACCPTED 2020-08-12 00:11:19

solution2 1 2020-08-12 00:09:01

solution3 1 2020-08-12 00:10:46

solution1
2 ACCPTED 2020-08-12 00:11:19

solution2
1 2020-08-12 00:09:01

solution3
1 2020-08-12 00:10:46