I am performing something like this for using right join in the spark application in java.
Dataset<Row> dataset3 = dataset1.join(dataset2,
(Seq<String>) dataset1.col("target_guid"),RightOuter.sql());
But getting this error
java.lang.ClassCastException: org.apache.spark.sql.Column cannot be
cast to scala.collection.Seq
Other than this, I couldn't find the way to use joins in java for the datasets. Could anyone help me finding a way to do this?
Can change your code to something like this,
Dataset<Row> dataset3 = dataset1.as("dataset1").join(dataset2.as("dataset2"),
dataset1.col("target_guid").equalTo(dataset2.col("target_guid")), RightOuter.sql());
If you wanted to use below dataset api in java-
def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame
then convert the string list into seq. Please keep below method handy to convert java list
to scala seq
as most of the spark apis accept scala seq
import scala.collection.JavaConversions;
<T> Buffer<T> toScalaSeq(List<T> list) {
return JavaConversions.asScalaBuffer(list);
}
Also you can't use joinType
as RightOuter.sql()
which evaluates to 'RIGHT OUTER'
. The supported join types
includes -
'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'leftanti', 'left_anti', 'cross'
Now you can use-
Dataset<Row> dataset3 = dataset1.join(dataset2,
toScalaSeq(Arrays.asList("target_guid")), "rightouter");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.