[英]How to append one dataframe into another dataframe as a column
May be my issue description is wrong but what is described below.可能是我的问题描述是错误的,但如下所述。 I need solution in pyspark.
我需要 pyspark 中的解决方案。
I have 2 data frames我有 2 个数据框
Df1
A B C
1 2 3
5 6 7
8 9 1
6 2 3
Df2
D E
a b
c d
e f
I want final dataframe as below我想要最终的 dataframe 如下
A B C D E
1 2 3 a b
1 2 3 c d
1 2 3 e f
5 6 7 a b
5 6 7 c d
5 6 7 e f
8 9 1 a b
8 9 1 c d
8 9 1 e f
6 2 3 a b
6 2 3 c d
6 2 3 e f
Basically new dataframe will be for each row for DF1 will repeat for each row of DF2.基本上新的 dataframe 将针对 DF1 的每一行将针对 DF2 的每一行重复。 Final count would be:
count(Df1) * count(Df2)
最终计数为:
count(Df1) * count(Df2)
Please help, I am new to pysaprk.请帮忙,我是pysaprk的新手。
You can use crossJoin
, which is df.crossJoin(df2)
.您可以使用
crossJoin
,即df.crossJoin(df2)
。 You can check this https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html . You can check this https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.crossJoin.html .
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.