May be my issue description is wrong but what is described below. I need solution in pyspark.
I have 2 data frames
Df1
A B C
1 2 3
5 6 7
8 9 1
6 2 3
Df2
D E
a b
c d
e f
I want final dataframe as below
A B C D E
1 2 3 a b
1 2 3 c d
1 2 3 e f
5 6 7 a b
5 6 7 c d
5 6 7 e f
8 9 1 a b
8 9 1 c d
8 9 1 e f
6 2 3 a b
6 2 3 c d
6 2 3 e f
Basically new dataframe will be for each row for DF1 will repeat for each row of DF2. Final count would be: count(Df1) * count(Df2)
Please help, I am new to pysaprk.
You can use crossJoin
, which is df.crossJoin(df2)
. You can check this https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.