简体   繁体   中英

Pandas filtering with multiple conditions

I'm trying to filter data with multiple conditions using .isin

I've created a dataframe with data like this.

    col_a   col_b   col_c
    abc     yes     a
    abc     no      b
    abc     yes     a
    def     no      b
    def     yes     a
    def     no      b
    def     yes     a
    def     no      b
    ghi     yes     a
    ghi     no      b
    ghi     yes     a

When I try this type of filtering, referring to this solution I seen on stack overflow I get back all NaN values. Pandas: Filtering multiple conditions

How can I apply the three conditions to filter?

fil_1 = test.isin({'col_a': ['abc','def','ghi']})
fil_2 = test.isin({'col_b': ['yes']})
fil_3 = test.isin({'col_c' :['a']})
data = test[fil_1 & fil_2 & fil_3]
data

You need:

fil_1 = test['col_a'].isin(['abc','def','ghi'])
fil_2 = test['col_b'].isin(['yes'])
fil_3 = test['col_c'].isin(['a'])

or

test.isin({'col_a': ['abc','def','ghi'],
           'col_b': ['yes'],
           'col_c' :['a']}).all(axis = 1)

df_filtered = test[fil_1 & fil_2 & fil_3]
print(df_filtered)
   col_a col_b col_c
0    abc   yes     a
2    abc   yes     a
4    def   yes     a
6    def   yes     a
8    ghi   yes     a
10   ghi   yes     a

or logic |

fil = test.isin({'col_a': ['abc','def','ghi'],'col_b': ['yes'],'col_c' :['a']})
df_filtered = df[fil]
print(df_filtered)

   col_a col_b col_c
0    abc   yes     a
1    abc   NaN   NaN
2    abc   yes     a
3    def   NaN   NaN
4    def   yes     a
5    def   NaN   NaN
6    def   yes     a
7    def   NaN   NaN
8    ghi   yes     a
9    ghi   NaN   NaN
10   ghi   yes     a

Now if we also use DataFrame.all :

df_filtered = df[fil.all(axis = 1)]
print(df_filtered)
   col_a col_b col_c
0    abc   yes     a
2    abc   yes     a
4    def   yes     a
6    def   yes     a
8    ghi   yes     a
10   ghi   yes     a

Detail

print(fil)
    col_a  col_b  col_c
0    True   True   True
1    True  False  False
2    True   True   True
3    True  False  False
4    True   True   True
5    True  False  False
6    True   True   True
7    True  False  False
8    True   True   True
9    True  False  False
10   True   True   True

print(test.isin({'col_a': ['abc','def','ghi']}))
    col_a  col_b  col_c
0    True  False  False
1    True  False  False
2    True  False  False
3    True  False  False
4    True  False  False
5    True  False  False
6    True  False  False
7    True  False  False
8    True  False  False
9    True  False  False
10   True  False  False

this return False in columns differences than col_a so you got NaN values ​​because you were using &

这是单线解决方案,

test[test.col_a.isin(['abc','def','ghi']) & test.col_b.isin(['yes']) & test.col_c.isin(['a'])]

A possible solution to filter your dataframe is the following: "cond1" select all the values in col_a that are either "abc" or "def" or "ghi". Then col_b is only "yes", and col_c is only "a".

cond1=(apd.col_a=="abc") | (apd.col_a=="def") | (apd.col_a=="ghi")

apd[ cond1 & (apd.col_b=="yes") & (apd.col_c=="a")]

Result:

    col_a   col_b   col_c
0   abc     yes     a
2   abc     yes     a
4   def     yes     a
6   def     yes     a
8   ghi     yes     a
10  ghi     yes     a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM