简体   繁体   English

Python:如何根据列名称中的子字符串匹配从数据集中过滤出列

[英]Python : How do you filter out columns from a dataset based on substring match in Column names

df_train = pd.read_csv('../xyz.csv')
headers = df_train.columns

I want to filter out those columns in headers which have _pct in their substring.我想过滤掉标题_pct字符串中包含_pct那些列。

Use df.filter使用df.filter

df = pd.DataFrame({'a':[1,2,3], 'b_pct':[1,2,3],'c_pct':[1,2,3],'d':[1]*3})

print(df.filter(items=[i for i in df.columns if '_pct' not in i]))

## or as jezrael suggested
# print(df[[i for i in df.columns if '_pct' not in i]])

Output:输出:

   a  d                                                                                                                                                           
0  1  1                                                                                                                                                           
1  2  1                                                                                                                                                           
2  3  1 

Use:用:

#data from AkshayNevrekar answer
df = df.loc[:, ~df.columns.str.contains('_pct')]
print (df)

Filter solution is not trivial:过滤器解决方案并非微不足道:

df = df.filter(regex=r'^(?!.*_pct).*$')

   a  d
0  1  1
1  2  1
2  3  1

Thank you, @IanS for another solutions:谢谢@IanS 提供另一种解决方案:

df[df.columns.difference(df.filter(like='_pct').columns).tolist()]

df.drop(df.filter(like='_pct').columns, axis=1)

由于df.columns返回列名列表,您可以使用列表理解并使用简单条件构建新列表:

new_headers = [x for x in headers if '_pct' not in x]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 您如何根据另一个 dataframe 中列的值以及该 Z6A8064B5DF479455500553C47C5505234067B 中的列字符串是否为 ZE8064B5DF47C55057DZ 过滤 dataframe? - How do you filter dataframe based off value of column in another dataframe and whether the string of a column in that dataframe is a substring? Python:如何根据 2 列中的条件过滤掉行 - Python: How to filter out rows based on a condition from 2 columns 如何根据 Python 列表中的列号过滤数据框中的行? - How do you filter rows in a dataframe based on the column numbers from a Python list? 如何对列名称中共享相同子字符串的列中的值求平均值 - How to average values from columns that shared the same substring in column names 如何根据列表元素是否包含 Python 中另一个列表中的 substring 来过滤掉列表元素 - How to filter out list elements based on if they contain a substring from another list in Python 如何从文件中打印出符合特定条件但有许多列要检查的行? - How do you print out lines from a file that match a certain condition, but you have many columns to check? 如何基于Python中的部分匹配从文本中删除子字符串? - How do I remove a substring from text based on a partial match in Python? 如何根据python中多个其他列的名称创建一个列? - How do I create one column based on the names of multiple other columns in python? Python如何根据子字符串过滤字符串 - Python how to filter string based on substring 在pandas中,如何根据特定的字符串值过滤列 - In pandas, how do you filter a column based on a specific string value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM