简体   繁体   中英

Join two dataframes in pyspark

I have two data frames:

df1

+----+----+
|key1|val1|
+----+----+
|a1  |   1|
|b1  |   2|
+----+----+

df2

+----+----+
|key2|val2|
+----+----+
|a2  |   3|
|b2  |   4|
+----+----+

And then I want to merge these two data frames to get the following data frame:

df3

+----+----+----+----+
|key1|val1|key2|val2|
+----+----+
|a1  |   1|a2  |   3|
|a1  |   1|b2  |   4|
|b1  |   2|a2  |   3|
|b1  |   2|b2  |   4|
+----+----+

How can I do this in PySaprk?

Try cross join as below,

df3 = df1.crossJoin(df2)
df3.show()

This should give output as you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM