[英]In Pandas, using isin to match dataframe to other dataframe
I have 2 dataframes: 我有2个数据框:
local_PC_user_filer_OpCode_sum: local_PC_user_filer_OpCode_sum:
client_op clienthostid eventSum feeling usersidid
0 5030 1 1 Happy 5
1 5030 1 2 Mad 5
2 5030 1 8 Sick 6
3 5030 3 9 GoingCrazy 8
df_old_enough_users: df_old_enough_users:
client_op clienthostid eventSum filerid timestamp usersidid
0 5030 1 1 1 1/11/2015 5
Now, what I'm trying to do is to take all the rows from local_PC_user_filer_OpCode_sum that have a match [['usersidid','clienthostid']] with df_old_enough_users, so what I would expect to find is: 现在,我想做的是从 local_PC_user_filer_OpCode_sum中获取所有与[['usersidid','clienthostid']]相匹配的行,并将它们与df_old_enough_users匹配,因此,我期望找到的是:
client_op clienthostid eventSum feeling usersidid
0 5030 1 1 Happy 5
I try to do so with isin: 我尝试用isin这样做:
local_PC_user_filer_OpCode_sum[local_PC_user_filer_OpCode_sum.clienthostid.isin(df_old_enough_users.loc[:,['usersidid','clienthostid']])].reset_index(drop=True)
But I'm getting an empty dataframe :( What am I doing wrong and is there a (better) way to do what I need? 但是我得到了一个空的数据框:(我做错了什么,有一种(更好的)方式来做我需要的事情吗?
Thank you, 谢谢,
You can use join
: 您可以使用
join
:
cols = ['usersidid', 'clienthostid']
a = local_PC_user_filer_OpCode_sum.set_index(cols)
print (df_old_enough_users.join(a, on=cols, lsuffix='_x')[local_PC_user_filer_OpCode_sum.columns].reset_index(drop=True))
client_op clienthostid eventSum filerid feeling usersidid
0 5030 1 1 1 Happy 5
1 5030 1 2 1 Mad 5
isin
solution does not work, because columns
and index
matching is necessary too in both DataFrames
. isin
解决方案不起作用,因为两个DataFrames
columns
和index
匹配也是必要的。
If you are interested in modifying @jezrael's answer, this might give you a cleaner answer. 如果您有兴趣修改@jezrael的答案,这可能会为您提供更干净的答案。
df = pd.merge(local_PC_user_filer_OpCode_sum,
df_old_enough_users[['usersidid','clienthostid']],
on=['usersidid','clienthostid'],
how="right")["client_op", "clienthostid", "eventSum", "filerid", "timestamp", "usersidid"]
df will have the exact columns from your original local_PC_user_filer_OpCode_sum
dataframe, and the rows returned will only be on the right table that you used as the filter. df将具有原始的
local_PC_user_filer_OpCode_sum
数据帧中的确切列,并且返回的行将仅在用作过滤器的正确表上。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.