[英]Filter dataframe by line row
Hi I am a beginner python user and I need some help. 嗨,我是Python初学者,我需要一些帮助。 I am trying to filter one dataframe against another.
我正在尝试针对另一个过滤一个数据框。
Df1 DF1
date emp# sku transaction#
2017-01-01 10 200 399
2017-01-01 10 201 399
2017-01-01 10 202 399
2017-01-01 11 203 399
2017-01-01 11 200 399
Df2 DF2
date emp# sku transaction#
2017-01-01 10 200 301
2017-01-01 11 200 301
Desired Df1 所需的Df1
date emp# sku transaction#
2017-01-01 10 200 399
2017-01-01 11 200 399
I know this can work with an inner join (one emp# and sku) but I would have erroneous columns, how can I do this as a filter? 我知道这可以与内部联接(一个emp#和sku)一起使用,但是我会有错误的列,如何作为过滤器呢?
Use merge
and the on
parameter: 使用
merge
和on
参数:
Df1.merge(Df2, on=['date','emp#','sku'], suffixes=('','_y'))\
.drop('transaction#_y', axis=1)
Output: 输出:
date emp# sku transaction#
0 2017-01-01 10 200 399
1 2017-01-01 11 200 399
Here is one way without pd.merge
. 这是没有
pd.merge
一种方法。 The benefit of this method is you don't have to play around with column names. 这种方法的好处是您不必使用列名。
df2 = df2.set_index(['emp#', 'sku'])
df2['transaction#'] = df1.set_index(['emp#', 'sku'])['transaction#']
df2 = df2.reset_index()
# emp# sku date transaction#
# 0 10 200 2017-01-01 399
# 1 11 200 2017-01-01 399
You can do a filter from df2
by converting the desired columns into a dictionary, with orientation set to list
, and then check in the values exist using isin
. 您可以通过将所需的列转换为字典(方向设置为
list
从df2
进行过滤,然后使用isin
检查值是否存在。 Lastly, take the min
of each row to ensure both conditions are met ie 最后,取每一行的
min
以确保同时满足两个条件,即
False
+ False
= False
False
+ False
= False
False
+ True
= False
False
+ True
= False
True
+ False
= False
True
+ False
= False
True
+ True
= True
True
+ True
= True
cols = ['emp#','sku']
df1[df1[cols].isin(df2[cols].to_dict(orient='list')).min(1)]
date emp# sku transaction#
0 2017-01-01 10 200 399
4 2017-01-01 11 200 399
您需要一个内部联接,它看起来像:保留仅在两个目录中都存在的行:
df1.join(df2, how='inner')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.