简体   繁体   中英

How to append one dataframe into another dataframe as a column

May be my issue description is wrong but what is described below. I need solution in pyspark.

I have 2 data frames

Df1
A B C
1 2 3
5 6 7
8 9 1
6 2 3

Df2
D E
a b
c d
e f

I want final dataframe as below

A B C D E
1 2 3 a b
1 2 3 c d
1 2 3 e f
5 6 7 a b
5 6 7 c d
5 6 7 e f
8 9 1 a b
8 9 1 c d
8 9 1 e f
6 2 3 a b
6 2 3 c d
6 2 3 e f

Basically new dataframe will be for each row for DF1 will repeat for each row of DF2. Final count would be: count(Df1) * count(Df2)

Please help, I am new to pysaprk.

You can use crossJoin , which is df.crossJoin(df2) . You can check this https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM