[英]How to check if a combined value of two columns exist in another dataframe in Pandas?
I have multiple dfs with two common columns我有多个具有两个公共列的 df
Sample df样本 df
user_id and event_date
abc | 1st june
abc | 2nd June
cdf | 15th july
dfg | 17th July
I want to check if a user_id
on a particular event_date
in df1 also exists in df2, df3, df4, and df5我想检查 df1 中特定
event_date
的user_id
是否也存在于 df2、df3、df4 和 df5 中
How do I find this?我怎么找到这个?
the following methods I tried but it worked with only taking " user_id
" into consideration and not with " event_date
"我尝试了以下方法,但它只考虑了“
user_id
”而不是“ event_date
”
method 1:方法一:
upi_sms =df1.assign(Insms=df2.user_id.isin(df1.user_id).astype(int))
method 2: merging dataframes on = [user_id, event_date]
方法 2:
on = [user_id, event_date]
合并数据帧
none of it gives me expected results.这些都没有给我预期的结果。
Expected Result:预期结果:
Combination of abc and 1st June should exist in df2
How do I achieve this?我如何实现这一目标?
I would do it following way, consider simple example:我会按照以下方式进行,考虑简单的例子:
import pandas as pd
df1 = pd.DataFrame({'x':['A','B','C'],'y':[1,2,3]})
df2 = pd.DataFrame({'x':['C','A','B'],'y':[3,2,1]})
df3 = pd.DataFrame({'x':['A','B','C'],'y':[0,0,0]})
and say you are interested in last row of df1
, ie where x is C and y is 3. Such row is also present in df2
(1st) but not df3
where there is row with x being C but have different.并说您对
df1
的最后一行感兴趣,即 x 是 C,y 是 3。这样的行也出现在df2
(第 1 行)中,但df3
中没有,其中 x 的行是 C 但有不同。
row = tuple(df1.iloc[-1]) # get last row of df1 as tuple
print(row in df2.itertuples(index=False)) # True
print(row in df3.itertuples(index=False)) # False
Observe it is important to pass index=False
as we did not want to take into account where number is inside pandas.DataFrame
观察传递
index=False
很重要,因为我们不想考虑数字在pandas.DataFrame
中的位置
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.