简体   繁体   English

如何从熊猫数据框中动态选择一个子集?

[英]How to dynamically select a subset from pandas dataframe?

I'm new to python and I wanted to do this particular task which doesn't seem obvious to me how to do it. 我是python的新手,我想执行此特定任务,但对我而言,如何执行它似乎并不明显。 I don't even know what to search in order to find it. 我什至不知道要搜索什么才能找到它。 First here is the code snippet and I'll explain what I'm aiming for below it: 首先,这里是代码片段,下面将解释我的目标:

import pandas as pd

mycolumns = ['col1', 'col2', 'col3']

df = pd.DataFrame(data=[[**1**,2,3,**1**,5,6],[1,2,3,4,5,6]], 
                  columns=['col1_l', 'col2_l', 'col3_l', 'col1_r', 'col2_r', 'col3_r'])

criteria = list()
for col in mycolumns :
     criterion = (df[col + '_l'] == df[col + '_r'])
     criteria.append(criterion)

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

print df

Output: 输出:

    col1_l  col2_l  col3_l  col1_r  col2_r col3_r
0     1,      2,     3,      1,      5,     6

What I want is to be able to select the dataframe rows that meet all the specified criteria, but the problem is that the number of columns is not fixed, each run could have different number of columns and I want to do the same each time I execute this. 我想要的是能够选择满足所有指定条件的数据框行,但是问题是列数不是固定的,每次运行都可能具有不同的列数,并且每次我都希望执行相同的操作执行这个。 Question is, how can I write this line: 问题是,我怎么写这一行:

df = df[criteria[0] | criteria[1] | ... | criteria[5]]

Keep in mind that the dataframe is obtained from a join sql query over a database, I just wrote this example dataframe for clarification. 请记住,该数据框是从数据库中的联接sql查询获得的,我只是编写了此示例数据框以进行说明。 Thank you and pardon me if this was obvious. 谢谢,请原谅我。

Use np.logical_or.reduce : 使用np.logical_or.reduce

print (df[np.logical_or.reduce(criteria)])
   col1_l  col2_l  col3_l  col1_r  col2_r  col3_r
0       1       2       3       1       5       6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM