简体   繁体   English

在熊猫中,使用isin将数据框与其他数据框进行匹配

[英]In Pandas, using isin to match dataframe to other dataframe

I have 2 dataframes: 我有2个数据框:

local_PC_user_filer_OpCode_sum: local_PC_user_filer_OpCode_sum:

   client_op  clienthostid  eventSum   feeling  usersidid
0       5030             1         1    Happy        5
1       5030             1         2    Mad          5
2       5030             1         8    Sick         6
3       5030             3         9  GoingCrazy     8

df_old_enough_users: df_old_enough_users:

    client_op   clienthostid    eventSum    filerid timestamp   usersidid
0   5030              1             1           1     1/11/2015    5

Now, what I'm trying to do is to take all the rows from local_PC_user_filer_OpCode_sum that have a match [['usersidid','clienthostid']] with df_old_enough_users, so what I would expect to find is: 现在,我想做的是 local_PC_user_filer_OpCode_sum中获取所有与[['usersidid','clienthostid']]相匹配的行并将它们与df_old_enough_users匹配,因此,我期望找到的是:

      client_op  clienthostid  eventSum    feeling       usersidid
0       5030             1         1        Happy          5

I try to do so with isin: 我尝试用isin这样做:

local_PC_user_filer_OpCode_sum[local_PC_user_filer_OpCode_sum.clienthostid.isin(df_old_enough_users.loc[:,['usersidid','clienthostid']])].reset_index(drop=True)

But I'm getting an empty dataframe :( What am I doing wrong and is there a (better) way to do what I need? 但是我得到了一个空的数据框:(我做错了什么,有一种(更好的)方式来做我需要的事情吗?

Thank you, 谢谢,

You can use join : 您可以使用join

cols = ['usersidid', 'clienthostid']
a = local_PC_user_filer_OpCode_sum.set_index(cols)
print (df_old_enough_users.join(a, on=cols, lsuffix='_x')[local_PC_user_filer_OpCode_sum.columns].reset_index(drop=True))

   client_op  clienthostid  eventSum  filerid feeling  usersidid
0       5030             1         1        1   Happy          5
1       5030             1         2        1     Mad          5

isin solution does not work, because columns and index matching is necessary too in both DataFrames . isin解决方案不起作用,因为两个DataFrames columnsindex匹配也是必要的。

If you are interested in modifying @jezrael's answer, this might give you a cleaner answer. 如果您有兴趣修改@jezrael的答案,这可能会为您提供更干净的答案。

df = pd.merge(local_PC_user_filer_OpCode_sum, 
              df_old_enough_users[['usersidid','clienthostid']], 
              on=['usersidid','clienthostid'], 
              how="right")["client_op", "clienthostid", "eventSum",  "filerid", "timestamp", "usersidid"]

df will have the exact columns from your original local_PC_user_filer_OpCode_sum dataframe, and the rows returned will only be on the right table that you used as the filter. df将具有原始的local_PC_user_filer_OpCode_sum数据帧中的确切列,并且返回的行将仅在用作过滤器的正确表上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM