[英]Scala Spark Dataframe - Sum for each row the content of Array[Double]
这是我的基本数据框:
root |-- user_id: string (nullable = true)
|-- review_id: string (nullable = true)
|-- review_influence: double (nullable = false)
目标是获得每个user_id的review_influence之和。 所以我试图汇总数据并总结如下:
val review_influence_listDF = review_with_influenceDF
.groupBy("user_id")
.agg(collect_list("review_id") as("list_review_id"), collect_list("review_influence") as ("list_review_influence"))
.agg(sum($"list_review_influence"))
但是我有这个错误:
org.apache.spark.sql.AnalysisException: cannot resolve 'sum(`list_review_influence`)' due to data type mismatch: function sum requires numeric types, not ArrayType(DoubleType,true);;
我该怎么办?
您可以直接将agg
函数中的列求和:
review_with_influenceDF
.groupBy("user_id")
.agg(collect_list($"review_id").as("list_review_id"),
sum($"review_influence").as("sum_review_influence"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.