[英]spark dataframe column group and collect_to_list not according to orderBy
I have a spark 2.2.0 dataframe dtfBase1
as below.我有一个 spark 2.2.0 dataframe
dtfBase1
,如下所示。 BAQ is ID, AAA is date and AAG is numeric value in double. BAQ 是 ID,AAA 是日期,AAG 是 double 中的数值。
And I would like to convert it into the following.我想将其转换为以下内容。 The value of AAG should be indexed according to the order of AAA.
AAG 的值应该按照 AAA 的顺序进行索引。
I used the following code我使用了以下代码
val dtfBase2=dtfBase1.orderBy($"BAQ",$"AAA").groupBy("BAQ").agg(collect_list("AAG") as "arrAAG")
But apparently in dtfBase2
the values of AAG seemed followed a random index instead of AAA's order in the original dataframe. How I index elements in arrAAG according to the order of AAA?但显然在
dtfBase2
中,AAG 的值似乎遵循随机索引而不是原始 dataframe 中的 AAA 顺序。我如何根据 AAA 的顺序索引 arrAAG 中的元素?
Assuming you're on Spark 2.4+, you can use array_sort
and array_join
假设您使用的是 Spark 2.4+,您可以使用
array_sort
和array_join
val dtfBase2 = dtfBase1.groupBy("BAQ")
.agg(array_sort(collect_list(struct('aaa, 'aag))) as "arrAAG")
.select('baq, array_join($"arrAAG.aag", ",") as "arrAAG")
It creates a struct with the AAA and AAG, collects those in the aggregate and then sorts.它使用 AAA 和 AAG 创建一个结构,将它们收集到聚合中,然后进行排序。 We then concatenate using
array_join
, but just on the AAG
element of the struct.然后我们使用
array_join
连接,但只是在结构的AAG
元素上。
Since you're on Spark 2.2, this version should work由于您使用的是 Spark 2.2,因此该版本应该可以使用
val dtfBase2 = dtfBase1.groupBy("BAQ")
.agg(sort_array(collect_list(struct('aaa, 'aag))) as "arrAAG")
.select('baq, concat_ws(",", $"arrAAG.aag") as "arrAAG")
I did this and it worked out.我这样做了并且成功了。 Somehow by caching
dtfBase1
with desired orderBy, the order was remembered somewhere and got passed to next step.不知何故,通过用所需的 orderBy 缓存
dtfBase1
,顺序被记住在某处并传递给下一步。 Feel free to suggest something doing it in one line.随意建议在一行中做某事。
val dtfBase1=....orderBy($"BAQ",$"AAA").cache()
val dtfBase2=dtfBase1.orderBy($"BAQ",$"AAA").groupBy("BAQ").agg(collect_list("AAG") as "arrAAG")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.