I believe the merge type in R is a left outer join. The merge I implemented in Python returned a dataframe that had the same shape as the resulting merged df in R. Although when I had dropped the duplicates (df2.drop_duplicates), 4000 rows were dropped in Python as opposed to the 50 rows dropped when applying the drop duplicates function to the post-merge R data frame
The dataframe I need to merge are df1 and df2
R:
df2<-merge( df2[ , -which(names(df2) %in% c(column9,column10))], df1[,c(column1,column2,column4,column5)],by.x=c(column1,column2),by.y=c(column2,column4),all.x=T
Python:
df2 = df2[[column1,column2,column3...column8]].merge(df1[[column1,column2,column4,column5]],how='left',left_on=[column1,column2],right_on=[column2,column4]
df2[column1] and df2[column2] are the columns I want to merge on because their names in df1 are df1[column2] and df1[column4] but have the same row values.
My gut tells me that the issue is stemming from this portion of the code that I might be misinterpreting: -which(names(df2) %in% c(column9,column10)
Please feel free to send some tips my way if I'm messing up somewhere
First, the list subset of columns in Pandas is no longer recommended . Instead, use reindex
to subset columns which handles missing labels.
And the R translation of -which(names(df2) %in% c(column9, column10))
in Pandas can be ~df2.columns.isin([column9, column10])
. And because isin
returns a boolean series, to subset consider DataFrame.loc
:
df2 = (df.loc[:, ~df2.columns.isin([column9, column10])]
.merge(df1.reindex([column1, column2, column4, column5], axis='columns'),
how='left',
left_on=[column1, column2],
right_on=[column2, column4])
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.