需要帮助将合并函数从 R 转换为 Python，生成的 df 的形状相同但在删除重复项后在 Python 中丢失更多行

Question

I believe the merge type in R is a left outer join.我相信 R 中的合并类型是左外连接。 The merge I implemented in Python returned a dataframe that had the same shape as the resulting merged df in R. Although when I had dropped the duplicates (df2.drop_duplicates), 4000 rows were dropped in Python as opposed to the 50 rows dropped when applying the drop duplicates function to the post-merge R data frame我在 Python 中实现的合并返回了一个数据帧，该数据帧与 R 中生成的合并 df 具有相同的形状。尽管当我删除重复项 (df2.drop_duplicates) 时，Python 中删除了 4000 行，而不是应用时删除的 50 行删除重复函数到合并后的 R 数据框

The dataframe I need to merge are df1 and df2我需要合并的数据框是 df1 和 df2

R:
df2<-merge( df2[ , -which(names(df2) %in% c(column9,column10))], df1[,c(column1,column2,column4,column5)],by.x=c(column1,column2),by.y=c(column2,column4),all.x=T

Python:
df2 = df2[[column1,column2,column3...column8]].merge(df1[[column1,column2,column4,column5]],how='left',left_on=[column1,column2],right_on=[column2,column4]

df2[column1] and df2[column2] are the columns I want to merge on because their names in df1 are df1[column2] and df1[column4] but have the same row values. df2[column1] 和 df2[column2] 是我想要合并的列，因为它们在 df1 中的名称是 df1[column2] 和 df1[column4] 但具有相同的行值。

My gut tells me that the issue is stemming from this portion of the code that I might be misinterpreting: -which(names(df2) %in% c(column9,column10)我的直觉告诉我这个问题源于我可能误解的这部分代码： -which(names(df2) %in% c(column9,column10)

Please feel free to send some tips my way if I'm messing up somewhere如果我在某个地方搞砸了，请随时以我的方式发送一些提示

Answer 1

First, the list subset of columns in Pandas is no longer recommended .首先，不再推荐Pandas 中列的列表子集。 Instead, use reindex to subset columns which handles missing labels.相反，使用reindex对处理缺失标签的列进行子集化。

And the R translation of -which(names(df2) %in% c(column9, column10)) in Pandas can be ~df2.columns.isin([column9, column10]) . Pandas 中-which(names(df2) %in% c(column9, column10))的 R 翻译可以是~df2.columns.isin([column9, column10]) 。 And because isin returns a boolean series, to subset consider DataFrame.loc :并且因为isin返回一个布尔系列，要考虑DataFrame.loc子集：

df2 = (df.loc[:, ~df2.columns.isin([column9, column10])]
         .merge(df1.reindex([column1, column2, column4, column5], axis='columns'),
                how='left', 
                left_on=[column1, column2], 
                right_on=[column2, column4])
      )

需要帮助将合并函数从 R 转换为 Python，生成的 df 的形状相同但在删除重复项后在 Python 中丢失更多行

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-02 15:14:41

需要帮助将合并函数从 R 转换为 Python，生成的 df 的形状相同但在删除重复项后在 Python 中丢失更多行

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-02 15:14:41

解决方案1
0 已采纳 2020-10-02 15:14:41