简体   繁体   English

按行过滤数据框

[英]Filter dataframe by line row

Hi I am a beginner python user and I need some help. 嗨,我是Python初学者,我需要一些帮助。 I am trying to filter one dataframe against another. 我正在尝试针对另一个过滤一个数据框。

Df1 DF1

 date          emp#   sku     transaction#   
 2017-01-01    10     200     399              
 2017-01-01    10     201     399             
 2017-01-01    10     202     399             
 2017-01-01    11     203     399             
 2017-01-01    11     200     399            

Df2 DF2

 date          emp#   sku     transaction#
 2017-01-01    10     200     301
 2017-01-01    11     200     301

Desired Df1 所需的Df1

 date          emp#   sku     transaction#
 2017-01-01    10     200     399
 2017-01-01    11     200     399

I know this can work with an inner join (one emp# and sku) but I would have erroneous columns, how can I do this as a filter? 我知道这可以与内部联接(一个emp#和sku)一起使用,但是我会有错误的列,如何作为过滤器呢?

Use merge and the on parameter: 使用mergeon参数:

Df1.merge(Df2, on=['date','emp#','sku'], suffixes=('','_y'))\
   .drop('transaction#_y', axis=1)

Output: 输出:

         date  emp#  sku  transaction#
0  2017-01-01    10  200           399
1  2017-01-01    11  200           399

Here is one way without pd.merge . 这是没有pd.merge一种方法。 The benefit of this method is you don't have to play around with column names. 这种方法的好处是您不必使用列名。

df2 = df2.set_index(['emp#', 'sku'])
df2['transaction#'] = df1.set_index(['emp#', 'sku'])['transaction#']
df2 = df2.reset_index()

#    emp#  sku        date  transaction#
# 0    10  200  2017-01-01           399
# 1    11  200  2017-01-01           399

You can do a filter from df2 by converting the desired columns into a dictionary, with orientation set to list , and then check in the values exist using isin . 您可以通过将所需的列转换为字典(方向设置为listdf2进行过滤,然后使用isin检查值是否存在。 Lastly, take the min of each row to ensure both conditions are met ie 最后,取每一行的min以确保同时满足两个条件,即

  1. False + False = False False + False = False
  2. False + True = False False + True = False
  3. True + False = False True + False = False
  4. True + True = True True + True = True

cols = ['emp#','sku']
df1[df1[cols].isin(df2[cols].to_dict(orient='list')).min(1)]

         date  emp#  sku  transaction#
0  2017-01-01    10  200           399
4  2017-01-01    11  200           399

您需要一个内部联接,它看起来像:保留仅在两个目录中都存在的行:

df1.join(df2, how='inner')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM