[英]Select a subset of a Pandas DataFrame based on a list of criteria built from another DataFrame
Suppose we have the following DataFrame 假设我们有以下DataFrame
>>> import pandas as pd
>>> df_org = pd.DataFrame({'A' : [1,2,3,4,5,6],
'B' : [1,1,1,1,2,2],
'C' : [1,2,3,4,1,2]})
A B C
0 1 1 1
1 2 1 2
2 3 1 3
3 4 1 4
4 5 2 1
5 6 2 2
And this another one, df_criteria
, that has some of the columns of df_org
and from which we will build our criteria. 而这一个又一个,
df_criteria
,有一些列的df_org
并从其中我们将建立我们的标准。 For instance: 例如:
>>> df_criteria = pd.DataFrame({'B' : [1,2],
'C' : [1,1]})
B C
0 1 1
1 2 1
I'd like to be able to fetch the value of A
in the df_org
DataFrame for which the corresponding values of the B
and C
match the ones listed in the df_criteria
DataFrame. 我希望能够在
df_org
帧中获取A
的值, df_org
B
和C
的对应值与df_criteria
帧中列出的df_criteria
匹配。 In this examples, I would like to have a subset of df_org
that contains its rows '0' and '4', like so: 在此示例中,我想要一个
df_org
的子集,其中包含其行“ 0”和“ 4”,如下所示:
A B C
0 1 1 1
4 5 2 1
Being a newbie in pandas, the way I've implemented this is using the for
-loop mindset: by iterating over the rows of df_criteria
and querying df_org
for each row. 作为熊猫的新手,我实现此目标的方法是使用
for
-loop思维方式:通过遍历df_criteria
的行并为每行查询df_org
。 However, this is very slow and I have the impression that there must be a more pythonic (and faster) way that does not make use of for
-loops. 但是,这非常慢,我的印象是必须有一种不使用
for
-loops的更pythonic(且更快)的方式。 I've also explored the use of DataFrame.lookup
, however it is not useful in my case because the indices in df_criteria
and df_org
do not necessarily match. 我还探讨了
DataFrame.lookup
,但是在我的情况下它没有用,因为df_criteria
和df_org
的索引不一定匹配。
Any suggestion would be very much appreciated. 任何建议将不胜感激。 Many thanks!
非常感谢!
A simple inner merge would work: 一个简单的内部合并将起作用:
In [285]:
df_org.merge(df_criteria, on=['B','C'])
Out[285]:
A B C
0 1 1 1
1 5 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.