尝试在 Databricks 环境中合并或连接两个 pyspark.sql.dataframe.DataFrame

Question

I have two dataframes in Azure Databricks.我在 Azure Databricks 中有两个数据帧。 Both are of type: pyspark.sql.dataframe.DataFrame两者都是类型：pyspark.sql.dataframe.DataFrame

The number of rows are the same;行数相同； indexes are the same.索引是一样的。 I thought one of these code snippets, below, would do the job.我认为下面的这些代码片段之一可以完成这项工作。

First Attempt:第一次尝试：

result = pd.concat([df1, df2], axis=1)


Error Message: TypeError: cannot concatenate object of type "<class 'pyspark.sql.dataframe.DataFrame'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

Second Attempt:第二次尝试：

result = pd.merge(df1, df2, left_index=True, right_index=True)

Error Message:  TypeError: Can only merge Series or DataFrame objects, a <class 'pyspark.sql.dataframe.DataFrame'> was passed

Answer 1

I ended up converting the two objects to pandas dataframes and then did the merge using the technique I know how to use.我最终将这两个对象转换为 Pandas 数据帧，然后使用我知道如何使用的技术进行合并。

Step #1:第1步：

df1= df1.select("*").toPandas()
df2= df2.select("*").toPandas()

Step #2:第2步：

result = pd.concat([df1, df2], axis=1)

Done!完毕！

Answer 2

I faced similar issue when combining two dataframes of same columns.组合相同列的两个数据框时，我遇到了类似的问题。

df = pd.concat([df, resultant_df], ignore_index=True)
TypeError: cannot concatenate object of type '<class 'pyspark.sql.dataframe.DataFrame'>'; only Series and DataFrame objs are valid

Then I tried join(), but it appends columns multiple times and returns empty dataframe.然后我尝试了 join()，但它多次追加列并返回空数据帧。

df.join(resultant_df)

After that I used union(), gets the exact result.之后我使用了 union()，得到了确切的结果。

df = df.union(resultant_df)
df.show()

It works fine in my case.在我的情况下它工作正常。

尝试在 Databricks 环境中合并或连接两个 pyspark.sql.dataframe.DataFrame

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-02-06 20:17:19

解决方案2
1 2021-02-24 10:07:08

尝试在 Databricks 环境中合并或连接两个 pyspark.sql.dataframe.DataFrame

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-02-06 20:17:19

解决方案2 1 2021-02-24 10:07:08

解决方案1
2 已采纳 2020-02-06 20:17:19

解决方案2
1 2021-02-24 10:07:08