简体   繁体   English

Scala Spark数据框-每行的总和为Array [Double]

[英]Scala Spark Dataframe - Sum for each row the content of Array[Double]

This is my basic Dataframe: 这是我的基本数据框:

root |-- user_id: string (nullable = true) 
     |-- review_id: string (nullable = true) 
     |-- review_influence: double (nullable = false)

The goal is to have the sum of review_influence for each user_id. 目标是获得每个user_id的review_influence之和。 So I tried to aggregate the data and sum it up like this: 所以我试图汇总数据并总结如下:

val review_influence_listDF = review_with_influenceDF
.groupBy("user_id")
.agg(collect_list("review_id") as("list_review_id"), collect_list("review_influence") as ("list_review_influence"))
.agg(sum($"list_review_influence"))

But I have this error: 但是我有这个错误:

org.apache.spark.sql.AnalysisException: cannot resolve 'sum(`list_review_influence`)' due to data type mismatch: function sum requires numeric types, not ArrayType(DoubleType,true);;

What can I do about it? 我该怎么办?

You can directly sum the column in the agg function: 您可以直接将agg函数中的列求和:

review_with_influenceDF
    .groupBy("user_id")
    .agg(collect_list($"review_id").as("list_review_id"), 
         sum($"review_influence").as("sum_review_influence"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Spark / Scala中将Array [Row]转换为DataFrame - Convert Array[Row] to DataFrame in Spark/Scala 使用 scala/spark 计算数据帧列中每一行的 z 分数 - calculate the z score for each row in the column of a dataframe using scala / spark 将列表添加到 Scala/Spark 中的数据帧,以便将每个元素添加到单独的行 - Adding a list to a dataframe in Scala / Spark such that each element is added to a separate row Spark(Scala):如何将Array [Row]转换为DataSet [Row]或DataFrame? - Spark (Scala): How to turn an Array[Row] into either a DataSet[Row] or a DataFrame? Spark Scala中数据框列及其数组之间的总和不同 - Different sum between dataframe column and its array in spark scala spark scala每个数据集输出为单个数据帧行 - spark scala each datasets output as a single row of dataframe Spark scala dataframe 获取每行的值并分配给变量 - Spark scala dataframe get value for each row and assign to variables Scala Spark获得Array每行中的热门单词 - scala Spark get the top words in each row of Array 如何将数组的每一行分解为 Spark (Scala) 中的列? - How to explode each row that is an Array into columns in Spark (Scala)? Scala / Spark:当行包含类型为double的字段时,如何打印数据集的内容[行] - Scala/Spark: How to print content of a dataset[row] when row consists of fields of type double
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM