简体   繁体   中英

convert scala dataframe to datset with array type column

I have a scala dataframe that looks like this:

+--------+--------------------+
|     uid|     recommendations|
+--------+--------------------+
|41344966|[[2174, 4.246965E...|
|41345063|[[2174, 0.0015455...|
|41346177|[[2996, 4.137125E...|
|41349171|[[2174, 0.0010590...|

df: org.apache.spark.sql.DataFrame = [uid: int, recommendations: array<struct<iid:int,rating:float>>]

I would like to convert it to a scala dataset, to take advantage of the added functionality. However, I am new to scala and unclear on how to write the conversion class when a column contains many data types. This is what I have:

val query = "SELECT * FROM myTable"
val df = spark.sql(query)

case class userRecs (uid: String, recommendations: Array[Int])
val ds = df.as[userRecs]

The error I get is:

org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(lambdavariable(MapObjects_loopValue47, MapObjects_loopIsNull47, StructField(iid,IntegerType,true), StructField(rating,FloatType,true), true) AS INT)' due to data type mismatch: cannot cast struct<iid:int,rating:float> to int;

How should I rewrite my class?

The solution was to create a class my other class could use:

case class productScore (iid: Int, rating: Float)
case class userRecs (uid: Int, recommendations: Array[productScore])

val ds = df.as[userRec]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM