简体   繁体   English

根据列名列表过滤 Pandas Dataframe

[英]Filtering Pandas Dataframe Based on List of Column Names

I have a pandas dataframe which has may be 1000 Columns.我有一个 pandas dataframe 可能有 1000 列。 However I do not need so many columns> I need columns only if they match/starts/contains specific strings.但是我不需要这么多列>只有当它们匹配/开始/包含特定字符串时才需要列。

So lets say I have a dataframe columns like df.columns =所以可以说我有一个 dataframe 列,如 df.columns =

  HYTY, ABNH, CDKL, GHY@UIKI,  BYUJI@#hy  BYUJI@tt  BBNNII#5  FGATAY@J ....

I want to select columns whose name are only like HYTY, CDKL, BYUJI* & BBNNI*我想要 select 列,其名称仅像 HYTY、CDKL、BYUJI* 和 BBNNI*

So what I was trying to do is to create a list of regular expressions like:所以我想做的是创建一个正则表达式列表,例如:

  import re 

  relst = ['HYTY', 'CDKL*', 'BYUJI*', 'BBNI*']


  my_w_lst = [re.escape(s) for s in relst]

  mask_pattrn = '|'.join(my_w_lst)

Then I create the logical vector to give me a list of TRUE/FALSE to say whether the string is present or not.然后我创建逻辑向量给我一个 TRUE/FALSE 列表来说明字符串是否存在。 However, not understanding how to get the dataframe of only those true selected columns from this.但是,不了解如何从中获取仅那些真正选择的列的 dataframe。

Any help will be appreciated.任何帮助将不胜感激。

Using what you already have you can pass your mask to filter like:使用你已经拥有的,你可以通过你的面具来过滤,比如:

df.filter(regex=mask_pattrn)

We can do startswith我们可以startswith

relst = ['CDKL', 'BYUJI', 'BBNI']

subdf = df.loc[:,df.columns.str.startswith(tuple(relst))|df.columns.isin(['HYTY'])]

Use re.findall() .使用re.findall() It will give you a list of columns to pass to df[mylist]它将为您提供要传递给df[mylist]的列列表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM