在熊猫中，使用isin将数据框与其他数据框进行匹配

Question

I have 2 dataframes: 我有2个数据框：

local_PC_user_filer_OpCode_sum: local_PC_user_filer_OpCode_sum：

   client_op  clienthostid  eventSum   feeling  usersidid
0       5030             1         1    Happy        5
1       5030             1         2    Mad          5
2       5030             1         8    Sick         6
3       5030             3         9  GoingCrazy     8

df_old_enough_users: df_old_enough_users：

    client_op   clienthostid    eventSum    filerid timestamp   usersidid
0   5030              1             1           1     1/11/2015    5

Now, what I'm trying to do is to take all the rows from local_PC_user_filer_OpCode_sum that have a match [['usersidid','clienthostid']] with df_old_enough_users, so what I would expect to find is: 现在，我想做的是从 local_PC_user_filer_OpCode_sum中获取所有与[['usersidid'，'clienthostid']]相匹配的行，并将它们与df_old_enough_users匹配，因此，我期望找到的是：

      client_op  clienthostid  eventSum    feeling       usersidid
0       5030             1         1        Happy          5

I try to do so with isin: 我尝试用isin这样做：

local_PC_user_filer_OpCode_sum[local_PC_user_filer_OpCode_sum.clienthostid.isin(df_old_enough_users.loc[:,['usersidid','clienthostid']])].reset_index(drop=True)

But I'm getting an empty dataframe :( What am I doing wrong and is there a (better) way to do what I need? 但是我得到了一个空的数据框:(我做错了什么，有一种（更好的）方式来做我需要的事情吗？

Thank you, 谢谢，

Answer 1

You can use join : 您可以使用join ：

cols = ['usersidid', 'clienthostid']
a = local_PC_user_filer_OpCode_sum.set_index(cols)
print (df_old_enough_users.join(a, on=cols, lsuffix='_x')[local_PC_user_filer_OpCode_sum.columns].reset_index(drop=True))

   client_op  clienthostid  eventSum  filerid feeling  usersidid
0       5030             1         1        1   Happy          5
1       5030             1         2        1     Mad          5

isin solution does not work, because columns and index matching is necessary too in both DataFrames . isin解决方案不起作用，因为两个DataFrames columns和index匹配也是必要的。

Answer 2

If you are interested in modifying @jezrael's answer, this might give you a cleaner answer. 如果您有兴趣修改@jezrael的答案，这可能会为您提供更干净的答案。

df = pd.merge(local_PC_user_filer_OpCode_sum, 
              df_old_enough_users[['usersidid','clienthostid']], 
              on=['usersidid','clienthostid'], 
              how="right")["client_op", "clienthostid", "eventSum",  "filerid", "timestamp", "usersidid"]

df will have the exact columns from your original local_PC_user_filer_OpCode_sum dataframe, and the rows returned will only be on the right table that you used as the filter. df将具有原始的local_PC_user_filer_OpCode_sum数据帧中的确切列，并且返回的行将仅在用作过滤器的正确表上。

在熊猫中，使用isin将数据框与其他数据框进行匹配

问题描述

2 个解决方案

解决方案1
2 2017-01-11 11:50:04

解决方案2
1 2017-01-11 12:03:52

在熊猫中，使用isin将数据框与其他数据框进行匹配

问题描述

2 个解决方案

解决方案1 2 2017-01-11 11:50:04

解决方案2 1 2017-01-11 12:03:52

解决方案1
2 2017-01-11 11:50:04

解决方案2
1 2017-01-11 12:03:52