简体   繁体   中英

How to dynamically select a subset from pandas dataframe?

I'm new to python and I wanted to do this particular task which doesn't seem obvious to me how to do it. I don't even know what to search in order to find it. First here is the code snippet and I'll explain what I'm aiming for below it:

import pandas as pd

mycolumns = ['col1', 'col2', 'col3']

df = pd.DataFrame(data=[[**1**,2,3,**1**,5,6],[1,2,3,4,5,6]], 
                  columns=['col1_l', 'col2_l', 'col3_l', 'col1_r', 'col2_r', 'col3_r'])

criteria = list()
for col in mycolumns :
     criterion = (df[col + '_l'] == df[col + '_r'])
     criteria.append(criterion)

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

print df

Output:

    col1_l  col2_l  col3_l  col1_r  col2_r col3_r
0     1,      2,     3,      1,      5,     6

What I want is to be able to select the dataframe rows that meet all the specified criteria, but the problem is that the number of columns is not fixed, each run could have different number of columns and I want to do the same each time I execute this. Question is, how can I write this line:

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

Keep in mind that the dataframe is obtained from a join sql query over a database, I just wrote this example dataframe for clarification. Thank you and pardon me if this was obvious.

Use np.logical_or.reduce :

print (df[np.logical_or.reduce(criteria)])
   col1_l  col2_l  col3_l  col1_r  col2_r  col3_r
0       1       2       3       1       5       6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM