简体   繁体   English

在 pyspark 中加入两个数据帧

[英]Join two dataframes in pyspark

I have two data frames:我有两个数据框:

df1 df1

+----+----+
|key1|val1|
+----+----+
|a1  |   1|
|b1  |   2|
+----+----+

df2 df2

+----+----+
|key2|val2|
+----+----+
|a2  |   3|
|b2  |   4|
+----+----+

And then I want to merge these two data frames to get the following data frame:然后我想合并这两个数据框得到以下数据框:

df3 df3

+----+----+----+----+
|key1|val1|key2|val2|
+----+----+
|a1  |   1|a2  |   3|
|a1  |   1|b2  |   4|
|b1  |   2|a2  |   3|
|b1  |   2|b2  |   4|
+----+----+

How can I do this in PySaprk?我怎样才能在 PySaprk 中做到这一点?

Try cross join as below,尝试cross join如下,

df3 = df1.crossJoin(df2)
df3.show()

This should give output as you want.这应该给 output 你想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM