简体   繁体   English

检查一个 dataframe 是否存在于另一个中

[英]check if one dataframe exists in another

I have 2 dataframes Overall and df2 .我有 2 个数据df2 Overall Overall全面的

Time                ID_1    ID_2               
2020-02-25 09:24:14 140209  81625000
2020-02-25 09:24:14 140216  91625000
2020-02-25 09:24:18 140219  80250000
2020-02-25 09:24:18 140221  90250000
25/02/2020 09:42:02     143982  39075000

df2 df2

ID_1    ID_2            Time                  Match?
140209  81625000    25/02/2020 09:24:14    no_match
143983  44075000    25/02/2020 09:42:02    no_match
143982  39075000    25/02/2020 09:42:02    match
143984  39075000    25/02/2020 09:42:02    no_match

I want to check if df2 exists in Overall and if so does df2.Match?我想检查df2是否存在于Overall中,如果存在df2.Match? of that same row say match.同一行的说匹配。 If so return a new column saying yes, if it doesn't say match return no.如果是,则返回一个新列,表示是,如果它没有说匹配,则返回否。

I have tried我努力了

Overall_1 = pds.merge(Overall, df2, on=….., how='left', indicator= 'Exist')
Overall_1.drop([...], inplace = True, axis =1 )
Overall_1['Exist']= np.where((Overall_1.Exist =='both') & (Overall_1.Match? == match), 'yes', 'no')

But an error prevails但错误盛行

TypeError: Cannot perform 'rand_' with a dtyped [bool] array and scalar of type [float]

So resulting Overall_1 dataframe should look like:因此生成Overall_1 dataframe 应该如下所示:

Time                ID_1    ID_2             Exist   
2020-02-25 09:24:14 140209  81625000     No
2020-02-25 09:24:14 140216  91625000     NaN
2020-02-25 09:24:18 140219  80250000     NaN
2020-02-25 09:24:18 140221  90250000     Nan
25/02/2020 09:42:02     143982  39075000     Yes

Using merge and np.select.使用mergenp.select.

import numpy as np
#df1 = Overall
df3 = pd.merge(df1,df2,on=['ID_1','ID_2','Time'],how='left',indicator='Exists')


col1 = df3['Match?']
col2 = df3['Exists']

conditions = [(col1.eq('match') & (col2.eq('both'))),
              (col1.eq('no_match') & (col2.eq('both')))
             ]

choices = ['yes','no']

df3['Exists'] = np.select(conditions,choices,default=np.nan)

print(df3.drop('Match?',axis=1))


                 Time    ID_1      ID_2 Exists
0 2020-02-25 09:24:14  140209  81625000     no
1 2020-02-25 09:24:14  140216  91625000    nan
2 2020-02-25 09:24:18  140219  80250000    nan
3 2020-02-25 09:24:18  140221  90250000    nan
4 2020-02-25 09:42:02  143982  39075000    yes

or more simply just using replace dict and .merge或者更简单地使用replace dict 和.merge

df3 = pd.merge(df1,df2,on=['ID_1','ID_2','Time'],how='left')\
                                      .replace({'no_match' : 'no', 
                                                'match' : 'yes'})\
                                      .rename(columns={'Match?' : 'Exists'})

print(df3)

                 Time    ID_1      ID_2 Exists
0 2020-02-25 09:24:14  140209  81625000     no
1 2020-02-25 09:24:14  140216  91625000    NaN
2 2020-02-25 09:24:18  140219  80250000    NaN
3 2020-02-25 09:24:18  140221  90250000    NaN
4 2020-02-25 09:42:02  143982  39075000    yes

you can try: df_diff = pd.concat([Overall,df2]).drop_duplicates(keep=False)你可以试试: df_diff = pd.concat([Overall,df2]).drop_duplicates(keep=False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查一个 dataframe 中的列对是否存在于另一个中? - Check if column pair in one dataframe exists in another? 检查一个数据框中的值是否存在于另一个数据框中 - Check if value from one dataframe exists in another dataframe 检查一个数据框中的值是否存在于另一个数据框中并创建列 - Check if value from one dataframe exists in another dataframe and create column 检查来自一个 dataframe 的文本是否存在于另一个 dataframe Python - Check if text from one dataframe exists in another dataframe Python 检查一列中的值是否存在于另一数据框中的多列中 - Check if values from one column, exists in multiple columns in another dataframe 检查一个条件中另一个数据框中是否存在一个数据框中的值 - Check if a vaue in a dataframe exists in another dataframe with a condition 检查一个数据帧中的值是否存在于另一个数据帧中,打印所有值对 - Check if value from one dataframe exists in another dataframe, print all pairs of values 如何检查值是否存在于另一个熊猫数据框中? - How to check if value exists in another dataframe in pandas? 如果一个dataframe值存在于另一个dataframe中,则从dataframe中取一个值 - If one dataframe value exists in another dataframe, then get a value from the dataframe 检查数据框中的 ID 是否存在于另一个数据框中的最快方法 - Fastest way to check if an ID in your dataframe exists in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM