简体   繁体   English

使用 Spark Scala 连接两个数据框

[英]Join two dataframe using Spark Scala

I have this code :我有这个代码:

   val o =    p_value.alias("d1").join(t_d.alias("d2"),
      (col("d1.origin_latitude")===col("d2.origin_latitude")&& 
      col("d1.origin_longitude")===col("d2.origin_longitude")),"left").
      filter(col("d2.origin_longitude").isNull)
   val c =    p_value2.alias("d3").join(o.alias("d4"),
      (col("d3.origin_latitude")===col("d4.origin_latitude") && 
       col("d3.origin_longitude")===col("d4.origin_longitude")),"left").
      filter(col("d3.origin_longitude").isNull)

I get this error :我收到此错误:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Reference 'd4.origin_latitude' is ambiguous, could be: d4.origin_latitude, d4.origin_latitude.;
at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:101)

On this line在这条线上

 (col("d3.origin_latitude")===col("d4.origin_latitude") && col("d3.origin_longitude")===col("d4.origin_longitude")),"left").

Any idea ?任何的想法 ?

Thank you .谢谢你 。

You are aliasing DataFrame not columns, which is used to access/refer columns in that DataFrame .您是别名DataFrame而不是列,它用于访问/引用该DataFrame列。 So the first join will result into another DataFrame having same column name twice ( origin_latitude as well as origin_longitude ).因此,第一次连接将导致另一个DataFrame具有两次相同的列名( origin_latitude以及origin_longitude )。 Once you try to access one of these columns in resultant DataFrame , you are going to get Ambiguity error.一旦您尝试访问结果DataFrame的这些列之一,您将收到Ambiguity错误。

So you need to make sure that DataFrame contains each column only once.所以你需要确保DataFrame只包含每列一次。 You can rewrite the first join as below:您可以按如下方式重写第一个连接:

p_value
      .join(t_d, Seq("origin_latitude", "origin_longitude"), "left")
      .filter(t_d.col("t_d.origin_longitude").isNull)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM