How do I create an Encoder of type List[Row] for creating a Dataset[ List[Row] ] in spark?

Question

Basically, I am performing 'groupbyKey' followed by 'mapGroups' transformation on spark dataframe. 'mapGroups' will produce Dataset[U], which requires an Encoder of type 'U'. I am converting each group of value to List[Row] type, for that I have to pass an Encoder. I am able to create Encoder of type 'Row' by its schema, but don't know how to create Encoder for 'List[Row]' datatype.

import sqlContext.implicits._
import org.apache.spark.sql._
import org.apache.spark.sql.catalyst.encoders._
val groupedDataset = df.repartition($"_id")
                        .groupByKey(row => row.getAs[Long]("_id"))
                        .mapGroups((key,value) => value.toList)( ??? Here Encoder of List[Row] is Required ???)`

Answer 1

您可以将Seq与import spark.implicits._一起使用。但是如果这是您的用例，您就可以不必使用它

df.groupBy("_id").agg(collect_list("the column you want to collect of values"))

How do I create an Encoder of type List[Row] for creating a Dataset[ List[Row] ] in spark?

Question

1 answers

solution1
0 ACCPTED 2018-05-24 11:52:11

How do I create an Encoder of type List[Row] for creating a Dataset[ List[Row] ] in spark?

Question

1 answers

solution1 0 ACCPTED 2018-05-24 11:52:11

solution1
0 ACCPTED 2018-05-24 11:52:11