根据列名列表过滤 Pandas Dataframe

Question

I have a pandas dataframe which has may be 1000 Columns.我有一个 pandas dataframe 可能有 1000 列。 However I do not need so many columns> I need columns only if they match/starts/contains specific strings.但是我不需要这么多列>只有当它们匹配/开始/包含特定字符串时才需要列。

So lets say I have a dataframe columns like df.columns =所以可以说我有一个 dataframe 列，如 df.columns =

  HYTY, ABNH, CDKL, GHY@UIKI,  BYUJI@#hy  BYUJI@tt  BBNNII#5  FGATAY@J ....

I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*我想要 select 列，其名称仅像 HYTY、CDKL、BYUJI* 和 BBNNI*

So what I was trying to do is to create a list of regular expressions like:所以我想做的是创建一个正则表达式列表，例如：

  import re 

  relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']


  my_w_lst = [re.escape(s) for s in relst]

  mask_pattrn = '|'.join(my_w_lst)

Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not.然后我创建逻辑向量给我一个 TRUE/FALSE 列表来说明字符串是否存在。 However, not understanding how to get the dataframe of only those true selected columns from this.但是，不了解如何从中获取仅那些真正选择的列的 dataframe。

Any help will be appreciated.任何帮助将不胜感激。

Answer 1

Using what you already have you can pass your mask to filter like:使用你已经拥有的，你可以通过你的面具来过滤，比如：

df.filter(regex=mask_pattrn)

Answer 2

We can do startswith我们可以startswith

relst = ['CDKL', 'BYUJI', 'BBNI']

subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]

Answer 3

Use re.findall() .使用re.findall() 。 It will give you a list of columns to pass to df[mylist]它将为您提供要传递给df[mylist]的列列表

根据列名列表过滤 Pandas Dataframe

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-08-12 00:11:19

解决方案2
1 2020-08-12 00:09:01

解决方案3
1 2020-08-12 00:10:46

根据列名列表过滤 Pandas Dataframe

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-08-12 00:11:19

解决方案2 1 2020-08-12 00:09:01

解决方案3 1 2020-08-12 00:10:46

解决方案1
2 已采纳 2020-08-12 00:11:19

解决方案2
1 2020-08-12 00:09:01

解决方案3
1 2020-08-12 00:10:46