[英]Select certain columns based on multiple criteria in pandas
I have the following dataset:我有以下数据集:
my_df = pd.DataFrame({'id':[1,2,3,4,5],
'type':['corp','smb','smb','corp','mid'],
'sales':[34567,2190,1870,22000,10000],
'sales_roi':[.10,.21,.22,.15,.16],
'sales_pct':[.38,.05,.08,.30,.20],
'sales_ln':[4.2,2.1,2.0,4.1,4],
'cost_pct':[22000,1000,900,14000,5000],
'flag':[0,1,0,1,1],
'gibberish':['bla','ble','bla','ble','bla'],
'tech':['lnx','mst','mst','lnx','mc']})
my_df['type'] = pd.Categorical(my_df.type)
my_df
id type sales sales_roi sales_pct sales_ln cost_pct flag gibberish tech
0 1 corp 34567 0.10 0.38 4.2 22000 0 bla lnx
1 2 smb 2190 0.21 0.05 2.1 1000 1 ble mst
2 3 smb 1870 0.22 0.08 2.0 900 0 bla mst
3 4 corp 22000 0.15 0.30 4.1 14000 1 ble lnx
4 5 mid 10000 0.16 0.20 4.0 5000 1 bla mc
And I want to filter out all variables who end in "_pct" or "_ln" or are equal to "gibberish" or "tech".我想过滤掉所有以“_pct”或“_ln”结尾或等于“gibberish”或“tech”的变量。 This is what I have tried:
这是我尝试过的:
df_selected = df.loc[:, ~my_df.columns.str.endswith('_pct') &
~my_df.columns.str.endswith('_ln') &
~my_df.columns.str.contains('gibberish','tech')]
But it returns me an unwanted column ("tech"):但它返回给我一个不需要的列(“技术”):
id type sales sales_roi flag tech
0 1 corp 34567 0.10 0 lnx
1 2 smb 2190 0.21 1 mst
2 3 smb 1870 0.22 0 mst
3 4 corp 22000 0.15 1 lnx
4 5 mid 10000 0.16 1 mc
This is the expected result:这是预期的结果:
id type sales sales_roi flag
0 1 corp 34567 0.10 0
1 2 smb 2190 0.21 1
2 3 smb 1870 0.22 0
3 4 corp 22000 0.15 1
4 5 mid 10000 0.16 1
Please consider that I have to deal with hundreds of variables and this is just an example of what I need.请考虑我必须处理数百个变量,这只是我需要的一个例子。 Any help will be greatly appreciated.
任何帮助将不胜感激。
Currently, what you are doing will return every column because of how the conditions are written.目前,由于条件的编写方式,您正在执行的操作将返回每一列。
endswith
will accept tuples so just put all the columns you are looking for in a single tuple and then filter endswith
将接受元组,因此只需将您要查找的所有列放在一个元组中,然后过滤
my_df[my_df.columns[~my_df.columns.str.endswith(('_pct','_ln','gibberish','tech'))]]
id type sales sales_roi flag
0 1 corp 34567 0.10 0
1 2 smb 2190 0.21 1
2 3 smb 1870 0.22 0
3 4 corp 22000 0.15 1
4 5 mid 10000 0.16 1
I would do it like this:我会这样做:
criterion = ["_pct", "_ln", "gibberish", "tech"]
for column in my_df:
for criteria in criterion:
if criteria in column:
my_df = my_df.drop(column, axis=1)
Ofcourse you can change the if statement in line 3 to endswith or something of your choice.当然,您可以将第 3 行中的 if 语句更改为 endswith 或您选择的其他内容。 Hope this helped:)
希望这有帮助:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.