[英]How to Join Multiple Columns in Spark SQL using Java for filtering in DataFrame
DataFrame a
= contains column x,y,z,k DataFrame a
= 包含列 x,y,z,k DataFrame b
= contains column x,y,a DataFrame b
= 包含列 x,y,a
a.join(b,<condition to use in java to use x,y >) ???
I tried using我尝试使用
a.join(b,a.col("x").equalTo(b.col("x")) && a.col("y").equalTo(b.col("y"),"inner")
But Java is throwing error saying &&
is not allowed.但是 Java 抛出错误,说
&&
是不允许的。
Spark SQL provides a group of methods on Column
marked as java_expr_ops
which are designed for Java interoperability. Spark SQL 在
Column
上提供了一组标记为java_expr_ops
的方法,这些方法是为 Java 互操作性而设计的。 It includes and
(see also or
) method which can be used here:它包括
and
(另见or
)方法,可在此处使用:
a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))
If you want to use Multiple columns for join, you can do something like this:如果要使用多列进行连接,可以执行以下操作:
a.join(b,scalaSeq, joinType)
You can store your columns in Java-List and convert List to Scala seq.您可以将列存储在 Java-List 中并将 List 转换为 Scala seq。 Conversion of Java-List to Scala-Seq:
Java-List 到 Scala-Seq 的转换:
scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();
Example: a = a.join(b, scalaSeq, "inner");
示例:
a = a.join(b, scalaSeq, "inner");
Note: Dynamic columns will be easily supported in this way.注意:通过这种方式可以轻松支持动态列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.