Need help converting a merge function from R to Python, shape of resulting df is the same but losing more rows in Python after dropping duplicates

Question

I believe the merge type in R is a left outer join. The merge I implemented in Python returned a dataframe that had the same shape as the resulting merged df in R. Although when I had dropped the duplicates (df2.drop_duplicates), 4000 rows were dropped in Python as opposed to the 50 rows dropped when applying the drop duplicates function to the post-merge R data frame

The dataframe I need to merge are df1 and df2

R:
df2<-merge( df2[ , -which(names(df2) %in% c(column9,column10))], df1[,c(column1,column2,column4,column5)],by.x=c(column1,column2),by.y=c(column2,column4),all.x=T

Python:
df2 = df2[[column1,column2,column3...column8]].merge(df1[[column1,column2,column4,column5]],how='left',left_on=[column1,column2],right_on=[column2,column4]

df2[column1] and df2[column2] are the columns I want to merge on because their names in df1 are df1[column2] and df1[column4] but have the same row values.

My gut tells me that the issue is stemming from this portion of the code that I might be misinterpreting: -which(names(df2) %in% c(column9,column10)

Please feel free to send some tips my way if I'm messing up somewhere

Answer 1

First, the list subset of columns in Pandas is no longer recommended . Instead, use reindex to subset columns which handles missing labels.

And the R translation of -which(names(df2) %in% c(column9, column10)) in Pandas can be ~df2.columns.isin([column9, column10]) . And because isin returns a boolean series, to subset consider DataFrame.loc :

df2 = (df.loc[:, ~df2.columns.isin([column9, column10])]
         .merge(df1.reindex([column1, column2, column4, column5], axis='columns'),
                how='left', 
                left_on=[column1, column2], 
                right_on=[column2, column4])
      )

Need help converting a merge function from R to Python, shape of resulting df is the same but losing more rows in Python after dropping duplicates

Question

1 answers

solution1
0 ACCPTED 2020-10-02 15:14:41

Need help converting a merge function from R to Python, shape of resulting df is the same but losing more rows in Python after dropping duplicates

Question

1 answers

solution1 0 ACCPTED 2020-10-02 15:14:41

solution1
0 ACCPTED 2020-10-02 15:14:41