[英]How do you filter dataframe based off value of column in another dataframe and whether the string of a column in that dataframe is a substring?
[英]Python : How do you filter out columns from a dataset based on substring match in Column names
df_train = pd.read_csv('../xyz.csv')
headers = df_train.columns
我想過濾掉標題_pct
字符串中包含_pct
那些列。
df = pd.DataFrame({'a':[1,2,3], 'b_pct':[1,2,3],'c_pct':[1,2,3],'d':[1]*3})
print(df.filter(items=[i for i in df.columns if '_pct' not in i]))
## or as jezrael suggested
# print(df[[i for i in df.columns if '_pct' not in i]])
輸出:
a d
0 1 1
1 2 1
2 3 1
用:
#data from AkshayNevrekar answer
df = df.loc[:, ~df.columns.str.contains('_pct')]
print (df)
過濾器解決方案並非微不足道:
df = df.filter(regex=r'^(?!.*_pct).*$')
a d
0 1 1
1 2 1
2 3 1
謝謝@IanS 提供另一種解決方案:
df[df.columns.difference(df.filter(like='_pct').columns).tolist()]
df.drop(df.filter(like='_pct').columns, axis=1)
由於df.columns
返回列名列表,您可以使用列表理解並使用簡單條件構建新列表:
new_headers = [x for x in headers if '_pct' not in x]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.