简体   繁体   中英

java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq

I am performing something like this for using right join in the spark application in java.

Dataset<Row> dataset3 = dataset1.join(dataset2, 
       (Seq<String>) dataset1.col("target_guid"),RightOuter.sql());

But getting this error

java.lang.ClassCastException: org.apache.spark.sql.Column cannot be 
cast to scala.collection.Seq

Other than this, I couldn't find the way to use joins in java for the datasets. Could anyone help me finding a way to do this?

Can change your code to something like this,

Dataset<Row> dataset3 = dataset1.as("dataset1").join(dataset2.as("dataset2"),
                dataset1.col("target_guid").equalTo(dataset2.col("target_guid")), RightOuter.sql());

If you wanted to use below dataset api in java-

 def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): DataFrame 

then convert the string list into seq. Please keep below method handy to convert java list to scala seq as most of the spark apis accept scala seq

import scala.collection.JavaConversions;
<T> Buffer<T> toScalaSeq(List<T> list) {
        return JavaConversions.asScalaBuffer(list);
    }

Also you can't use joinType as RightOuter.sql() which evaluates to 'RIGHT OUTER' . The supported join types includes -

'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'leftanti', 'left_anti', 'cross'

Now you can use-

Dataset<Row> dataset3 = dataset1.join(dataset2,
                toScalaSeq(Arrays.asList("target_guid")), "rightouter");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM