I have two different excel files which I read using pd.readExcel
. The first excel file is kind of a master file which has a lot of columns. showing only those columns which are relevant: df1
Company Name Excel Company ID
0 cleverbridge AG IQ109133656
1 BT España, Compañía de Servicios Globales de T... IQ3806173
2 Technoserv Group IQ40333012
3 Blue Media S.A. IQ50008102
4 zeb.rolfes.schierenbeck.associates gmbh IQ30413992
and the second excel is basically an output excel file which looks like this: df2
company_id found_keywords no_of_url company_name
0 IQ137156215 insurance 15 Zühlke Technology Group AG
1 IQ3806173 insurance 15 BT España, Compañía de Servicios Globales de T...
2 IQ40333012 insurance 4 Technoserv Group
3 IQ51614192 insurance 15 Octo Telematics S.p.A.
I want this output excel file/ df2 to include those company_id and company name from df1 where company id and company name from df1 is not a part of df2. Something like this: df2
company_id found_keywords no_of_url company_name
0 IQ137156215 insurance 15 Zühlke Technology Group AG
1 IQ3806173 insurance 15 BT España, Compañía de Servicios Globales de T...
2 IQ40333012 insurance 4 Technoserv Group
3 IQ51614192 insurance 15 Octo Telematics S.p.A.
4 IQ30413992 NaN NaN zeb.rolfes.schierenbeck.associates gmbh
I tried several ways of achieveing this by using pd.merge
as well as np.where
I even tried reindexing based on columns but nothing worked out. What exactly do I need to do so that it works as expected. Please help me out.Thanks!
EDIT :
using pd.merge
df2.merge(df, right_on='company_id', left_on='Excel Company ID', how='outer')
which gave an output with [220 rows X 31 columns]
Your expected output is unclear. If you use pd.merge
with how='outer'
and indicator=True
, you will have:
df1 = df1.rename(columns={'Company Name': 'company_name', 'Excel Company ID': 'company_id'})
out = df2.merge(df1, on=['company_id', 'company_name'], how='outer', indicator=True)
Output:
>>> out
company_id found_keywords no_of_url company_name _merge
0 IQ137156215 insurance 15.0 Zühlke Technology Group AG left_only
1 IQ3806173 insurance 15.0 BT España, Compañía de Servicios Globales de T... both
2 IQ40333012 insurance 4.0 Technoserv Group both
3 IQ51614192 insurance 15.0 Octo Telematics S.p.A. left_only
4 IQ109133656 NaN NaN cleverbridge AG right_only
5 IQ50008102 NaN NaN Blue Media S.A. right_only
6 IQ30413992 NaN NaN zeb.rolfes.schierenbeck.associates gmbh right_only
Check the last column _merge
. If you have right_only
, it means the company_id
and company_name
are not found in df2
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.