簡體   English   中英

Scala Spark數據框-每行的總和為Array [Double]

[英]Scala Spark Dataframe - Sum for each row the content of Array[Double]

這是我的基本數據框:

root |-- user_id: string (nullable = true) 
     |-- review_id: string (nullable = true) 
     |-- review_influence: double (nullable = false)

目標是獲得每個user_id的review_influence之和。 所以我試圖匯總數據並總結如下:

val review_influence_listDF = review_with_influenceDF
.groupBy("user_id")
.agg(collect_list("review_id") as("list_review_id"), collect_list("review_influence") as ("list_review_influence"))
.agg(sum($"list_review_influence"))

但是我有這個錯誤:

org.apache.spark.sql.AnalysisException: cannot resolve 'sum(`list_review_influence`)' due to data type mismatch: function sum requires numeric types, not ArrayType(DoubleType,true);;

我該怎么辦?

您可以直接將agg函數中的列求和:

review_with_influenceDF
    .groupBy("user_id")
    .agg(collect_list($"review_id").as("list_review_id"), 
         sum($"review_influence").as("sum_review_influence"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM