简体   繁体   English

根据从另一个数据框构建的条件列表,选择一个熊猫数据框的子集

[英]Select a subset of a Pandas DataFrame based on a list of criteria built from another DataFrame

Suppose we have the following DataFrame 假设我们有以下DataFrame

>>> import pandas as pd

>>> df_org = pd.DataFrame({'A' : [1,2,3,4,5,6], 
                           'B' : [1,1,1,1,2,2],
                           'C' : [1,2,3,4,1,2]})
   A  B  C
0  1  1  1
1  2  1  2
2  3  1  3
3  4  1  4
4  5  2  1
5  6  2  2

And this another one, df_criteria , that has some of the columns of df_org and from which we will build our criteria. 而这一个又一个, df_criteria ,有一些列的df_org并从其中我们将建立我们的标准。 For instance: 例如:

>>> df_criteria = pd.DataFrame({'B' : [1,2], 
                                'C' : [1,1]}) 

   B  C
0  1  1
1  2  1

I'd like to be able to fetch the value of A in the df_org DataFrame for which the corresponding values of the B and C match the ones listed in the df_criteria DataFrame. 我希望能够在df_org帧中获取A的值, df_org BC的对应值与df_criteria帧中列出的df_criteria匹配。 In this examples, I would like to have a subset of df_org that contains its rows '0' and '4', like so: 在此示例中,我想要一个df_org的子集,其中包含其行“ 0”和“ 4”,如下所示:

   A  B  C
0  1  1  1
4  5  2  1

Being a newbie in pandas, the way I've implemented this is using the for -loop mindset: by iterating over the rows of df_criteria and querying df_org for each row. 作为熊猫的新手,我实现此目标的方法是使用for -loop思维方式:通过遍历df_criteria的行并为每行查询df_org However, this is very slow and I have the impression that there must be a more pythonic (and faster) way that does not make use of for -loops. 但是,这非常慢,我的印象是必须有一种不使用for -loops的更pythonic(且更快)的方式。 I've also explored the use of DataFrame.lookup , however it is not useful in my case because the indices in df_criteria and df_org do not necessarily match. 我还探讨了DataFrame.lookup ,但是在我的情况下它没有用,因为df_criteriadf_org的索引不一定匹配。

Any suggestion would be very much appreciated. 任何建议将不胜感激。 Many thanks! 非常感谢!

A simple inner merge would work: 一个简单的内部合并将起作用:

In [285]:

df_org.merge(df_criteria, on=['B','C'])
Out[285]:
   A  B  C
0  1  1  1
1  5  2  1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他数据框熊猫从数据框中重新排序并选择子集 - Reorder and select subset from dataframe based on other dataframe pandas 子集根据另一个数据帧的值在多个列上进行pandas数据帧 - Subset pandas dataframe on multiple columns based on values from another dataframe 根据来自另一个 DataFrame 的标准过滤 Pandas 中的 DataFrame - Filtering DataFrame in pandas based on criteria from another DataFrame Pandas:根据子集在另一个 dataframe 中查找重复项 - Pandas: find duplicates in another dataframe based on a subset 根据多个条件从 Pandas DataFrame 中随机选择行 - Randomly select rows from Pandas DataFrame based on multiple criteria 检查列表是否是 pandas Dataframe 中另一个列表的子集 - Checking if a list is a subset of another in a pandas Dataframe Python Pandas Dataframe:基于DateTime条件,我想用来自另一个数据框的数据填充一个数据框 - Python Pandas Dataframe: based on DateTime criteria, I would like to populate a dataframe with data from another dataframe 根据另一个 dataframe 中的点列表从 pandas dataframe 中删除行 - drop rows from a pandas dataframe based on list of points in another dataframe 基于另一个DataFrame中的两列对pandas DataFrame进行子集 - Subset pandas DataFrame based on two columns in another DataFrame 如何基于熊猫python中的另一个数据框获取数据框的子集 - How to get the subset of dataframe based on another dataframe in pandas python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM