简体   繁体   中英

Pandas - Filter based on multiple conditions

I am facing a problem when I need to exclude some rows from by dataframe.

here the code:

import numpy as np
import pandas as pd
dff = {
    'ID': [1, 2, 3, 4, 5, 6, 7],
    'Default': [1,1,0,1,0,1,0],
    'Default_Amount': [1200,2000,0,350,0,760,0],
    'Tot_Amount': [1200,2000,3400,350,10000,760,7500],
    'Time' : ['October','March','November','November','January','December','January'],
    'Class': ['A','B','A','A','B','B','A']
}
dff = pd.DataFrame(dff)
display(dff)

dff[(dff.Time != 'November') & (dff.Default == 1) ]

What I am trying to do, is to exclude the row in dataframe that has the following two conditions: time = November with default = 1 (that is the ID 4 in the dataframe).

But if I execute this code " dff[(dff.Time != 'November') & (dff.Default == 1) ] ", it excludes also the other time = "November" and default = 0.

How can I avoid this problem?

Thanks in advance!

You need match not equal dff.Default != 1 with bitwise OR by | :

df = dff[(dff.Time != 'November') | (dff.Default != 1) ]

Or invert mask, but change | to & for bitwise AND and change != to == :

df = dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

This is simple boolean algebra

Either follow De Morgan's law and invert both conditions while getting an OR ( | ):

dff[(dff.Time != 'November') | (dff.Default != 1) ]

or add ( & ) your two conditions, and negate ( ~ ):

dff[~((dff.Time == 'November') & (dff.Default == 1)) ]

Why don't you simply use

dff[dff.ID != 4]

or otherwise you can use

dff[~((dff.Time == 'November') & (dff.Default == 1))]

The problem with you code is that (dff.Time != 'November') first selects all non-November rows and then (dff.Default == 1) selects all row with Default '1' only.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM