Spark-合并后，聚合列从DataFrame中消失

Question

I wanted to count the number of items for each sale_id and decided to use a count function. 我想计算每个sale_id的商品数量，并决定使用count函数。 The idea was to have item_numbers as the last column and not to affect the original columns ordering from salesDf . 这个想法是让item_numbers作为最后一列，而不影响从salesDf排序的原始列。

But after the join sale_id column became the first one in df3 . 但是在加入后， sale_id列成为df3的第一列。 So in order to fix this I tried .select(salesDf.schema.fieldNames.map(col):_*) However after that item_numbers column is missing (while other columns ordering is correct). 因此，为了解决此问题，我尝试使用.select(salesDf.schema.fieldNames.map(col):_*)但是之后缺少item_numbers列（而其他列的排序是正确的）。

How do I preserve the correct ordering leaving item_numbers column in place at the same time? 如何保留正确的排序，同时保留item_numbers列？

 val df2 = salesDf.groupBy("sale_id").agg(count("item_id").as("item_numbers"))
 val df3 = salesDf.join(df2, "sale_id").select(salesDf.schema.fieldNames.map(col):_*)

Answer 1

To preserve salesDf 's column order in the final result, you could assemble the column list for select as follows: 要在最终结果中保留salesDf的列顺序，可以按如下方式组合select的列列表：

val df2 = salesDf.groupBy("sale_id").agg(count("item_id").as("item_numbers"))
val df3 = salesDf.join(df2, "sale_id")

val orderedCols = salesDf.columns :+ "item_numbers"
val resultDF = df3.select(orderedCols.map(col): _*)

Spark-合并后，聚合列从DataFrame中消失

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-14 16:18:17

Spark-合并后，聚合列从DataFrame中消失

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-14 16:18:17

解决方案1
1 已采纳 2018-08-14 16:18:17