[英]Python : How do you filter out columns from a dataset based on substring match in Column names
df_train = pd.read_csv('../xyz.csv')
headers = df_train.columns
I want to filter out those columns in headers which have _pct
in their substring.我想过滤掉标题
_pct
字符串中包含_pct
那些列。
Use:用:
#data from AkshayNevrekar answer
df = df.loc[:, ~df.columns.str.contains('_pct')]
print (df)
Filter solution is not trivial:过滤器解决方案并非微不足道:
df = df.filter(regex=r'^(?!.*_pct).*$')
a d
0 1 1
1 2 1
2 3 1
Thank you, @IanS for another solutions:谢谢@IanS 提供另一种解决方案:
df[df.columns.difference(df.filter(like='_pct').columns).tolist()]
df.drop(df.filter(like='_pct').columns, axis=1)
由于df.columns
返回列名列表,您可以使用列表理解并使用简单条件构建新列表:
new_headers = [x for x in headers if '_pct' not in x]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.