Pandas DataFrame select rows based on values of multiple columns whose names are specified in a list

Question

I have the following dataframe:

import pandas as pd
import numpy as np
ds = pd.DataFrame({'z':np.random.binomial(n=1,p=0.5,size=10), 
                   'x':np.random.binomial(n=1,p=0.5,size=10), 
                   'u':np.random.binomial(n=1,p=0.5,size=10), 
                   'y':np.random.binomial(n=1,p=0.5,size=10)})
ds

    z   x   u   y
0   0   1   0   0
1   0   1   1   1
2   1   1   1   1
3   0   0   1   1
4   0   0   1   1
5   0   0   0   0
6   1   0   1   1
7   0   1   1   1
8   1   1   0   0
9   0   1   1   1

How do I select rows that have the values (0,1) for variable names specified in a list?

This is what I have thus far:

zs = ['z','x']
tf = ds[ds[zs].values == (0,1)]
tf

Now that prints:

    z   x   u   y
0   0   1   0   0
0   0   1   0   0
1   0   1   1   1
1   0   1   1   1
2   1   1   1   1
3   0   0   1   1
4   0   0   1   1
5   0   0   0   0
7   0   1   1   1
7   0   1   1   1
8   1   1   0   0
9   0   1   1   1
9   0   1   1   1

Which shows duplicates and also has incorrect row (row #2 - 1,1,1,1). Any thoughts or ideas? Of course I am assuming there is a pythonic way of doing this without nested loops and brute-forcing it.

Answer 1

You can use broadcasted numpy comparison:

df[(df[['z','x']].values == [0, 1]).all(1)]

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

You can also use np.logical_and.reduce :

cols = ['z', 'x']
vals = [0, 1]

df[np.logical_and.reduce([df[c] == v for c, v in zip(cols, vals)])]

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

Lastly, assuming your column names are compatible, dynamically generate query expression strings for use with query :

querystr = ' and '.join([f'{c} == {v!r}' for c,  v in zip(cols, vals)])
df.query(querystr)

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

Where {v!r} is the same as {repr(v)} .

Answer 2

You can do:

cols = ['u','x']
bools = ds[cols].apply(lambda x: all(x == (0,1)), axis=1)
ds[bools]

   u  x  y  z
0  0  1  1  1
7  0  1  0  1
8  0  1  1  0

Answer 3

Using eq , and very similar to cold's numpy method

df[df[zs].eq(pd.Series([0,1],index=zs),1).all(1)]
   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

Answer 4

A simpler way is to use boolean indexing :

f = ds['z'] == 0
g = ds['x'] == 1
ds[f & g]

Pandas DataFrame select rows based on values of multiple columns whose names are specified in a list

Question

4 answers

solution1
3 ACCPTED 2019-01-21 23:34:27

solution2
2 2019-01-21 23:32:43

solution3
1 2019-01-21 23:38:53

solution4
0 2019-01-21 23:43:37

Pandas DataFrame select rows based on values of multiple columns whose names are specified in a list

Question

4 answers

solution1 3 ACCPTED 2019-01-21 23:34:27

solution2 2 2019-01-21 23:32:43

solution3 1 2019-01-21 23:38:53

solution4 0 2019-01-21 23:43:37

solution1
3 ACCPTED 2019-01-21 23:34:27

solution2
2 2019-01-21 23:32:43

solution3
1 2019-01-21 23:38:53

solution4
0 2019-01-21 23:43:37