简体   繁体   English

Select 某些列基于 pandas 中的多个条件

[英]Select certain columns based on multiple criteria in pandas

I have the following dataset:我有以下数据集:

my_df = pd.DataFrame({'id':[1,2,3,4,5],
                      'type':['corp','smb','smb','corp','mid'],
                      'sales':[34567,2190,1870,22000,10000],
                      'sales_roi':[.10,.21,.22,.15,.16],
                      'sales_pct':[.38,.05,.08,.30,.20],
                      'sales_ln':[4.2,2.1,2.0,4.1,4],
                      'cost_pct':[22000,1000,900,14000,5000],
                      'flag':[0,1,0,1,1],
                      'gibberish':['bla','ble','bla','ble','bla'],
                      'tech':['lnx','mst','mst','lnx','mc']})
my_df['type'] = pd.Categorical(my_df.type)
my_df
    id  type    sales   sales_roi   sales_pct   sales_ln    cost_pct    flag    gibberish   tech
0   1   corp    34567   0.10        0.38        4.2         22000       0       bla         lnx
1   2   smb     2190    0.21        0.05        2.1         1000        1       ble         mst
2   3   smb     1870    0.22        0.08        2.0         900         0       bla         mst
3   4   corp    22000   0.15        0.30        4.1         14000       1       ble         lnx
4   5   mid     10000   0.16        0.20        4.0         5000        1       bla         mc

And I want to filter out all variables who end in "_pct" or "_ln" or are equal to "gibberish" or "tech".我想过滤掉所有以“_pct”或“_ln”结尾或等于“gibberish”或“tech”的变量。 This is what I have tried:这是我尝试过的:

df_selected = df.loc[:, ~my_df.columns.str.endswith('_pct') &
~my_df.columns.str.endswith('_ln') &
~my_df.columns.str.contains('gibberish','tech')]

But it returns me an unwanted column ("tech"):但它返回给我一个不需要的列(“技术”):

    id  type    sales   sales_roi   flag    tech
0   1   corp    34567   0.10        0       lnx
1   2   smb     2190    0.21        1       mst
2   3   smb     1870    0.22        0       mst
3   4   corp    22000   0.15        1       lnx
4   5   mid     10000   0.16        1       mc

This is the expected result:这是预期的结果:

    id  type    sales   sales_roi   flag
0   1   corp    34567   0.10        0   
1   2   smb     2190    0.21        1   
2   3   smb     1870    0.22        0    
3   4   corp    22000   0.15        1   
4   5   mid     10000   0.16        1    

Please consider that I have to deal with hundreds of variables and this is just an example of what I need.请考虑我必须处理数百个变量,这只是我需要的一个例子。 Any help will be greatly appreciated.任何帮助将不胜感激。

Currently, what you are doing will return every column because of how the conditions are written.目前,由于条件的编写方式,您正在执行的操作将返回每一列。 endswith will accept tuples so just put all the columns you are looking for in a single tuple and then filter endswith将接受元组,因此只需将您要查找的所有列放在一个元组中,然后过滤

my_df[my_df.columns[~my_df.columns.str.endswith(('_pct','_ln','gibberish','tech'))]]

   id  type  sales  sales_roi  flag
0   1  corp  34567       0.10     0
1   2   smb   2190       0.21     1
2   3   smb   1870       0.22     0
3   4  corp  22000       0.15     1
4   5   mid  10000       0.16     1

I would do it like this:我会这样做:

criterion = ["_pct", "_ln", "gibberish", "tech"]

for column in my_df:
    for criteria in criterion:
        if criteria in column:
            my_df = my_df.drop(column, axis=1)

Ofcourse you can change the if statement in line 3 to endswith or something of your choice.当然,您可以将第 3 行中的 if 语句更改为 endswith 或您选择的其他内容。 Hope this helped:)希望这有帮助:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM