简体   繁体   中英

Left Join errors out: org.apache.spark.sql.AnalysisException: Detected implicit cartesian product

"left join" requires either "spark.sql.crossJoin.enabled=true" or calling "persist()" on one dataframe.

SELECT * FROM LHS left join RHS on LHS.R = RHS.R

How do I make "left join" work without both "spark.sql.crossJoin.enabled=true" and persisting a dataframe?

The exception below occurs in both Spark 2.3.3 and 2.4.4.

Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for LEFT OUTER join between logical plans OneRowRelation and ... Join condition is missing or trivial. Either: use the CROSS JOIN syntax to allow cartesian products between these relations, or: enable implicit cartesian products by setting the configuration variable spark.sql.crossJoin.enabled=true;

Spark2.4.3 using dataframe

scala> var lhs = spark.createDataFrame(Seq((1,"sda"),(2,"abc"))).toDF("id","value")
scala> var rhs = spark.createDataFrame(Seq((2,"abc"),(3,"xyz"))).toDF("id1","value1")

scala> lhs.join(rhs,col("id")===col("id1"),"left_outer")

scala> lhs.join(rhs,col("id")===col("id1"),"left_outer").show
+---+-----+----+------+
| id|value| id1|value1|
+---+-----+----+------+
|  1|  sda|null|  null|
|  2|  abc|   2|   abc|
+---+-----+----+------+

Not facing any issue.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM