从 Spark DataFrame 列出的所有列的区别

Question

I have a data frame like below and I want to convert to expected format as distinct values in a list.我有一个如下所示的数据框，我想将其转换为预期格式作为列表中的不同值。

+---------------------+---------------+
|col1                 |col2           |
+---------------------+---------------+
|                  A  |             1 |
|                  B  |             2 |
|                  C  |             1 |
|                  D  |             1 |
|                  A  |             2 |
|               null  |             1 |
+---------------------+---------------+

Expected Format预期格式

+---------------------+---------------+
|col1                 |col2           |
+---------------------+---------------+
|      [A,B,C,D,null] |         [1,2] |
+---------------------+---------------+

Is there any wany to solve the above problem.??有什么办法可以解决上面的问题吗？？？

Thanks in Advance !!提前致谢！！

Answer 1

You can do something like this你可以做这样的事情

import spark.implicits._

df
  .na.fill("null", Seq("col1"))
  .agg(
     func.collect_set($"col1").alias("col1"),
     func.collect_set($"col2").alias("col2")
  )

从 Spark DataFrame 列出的所有列的区别

问题描述

1 个解决方案

解决方案1
1 2019-10-20 09:40:16

从 Spark DataFrame 列出的所有列的区别

问题描述

1 个解决方案

解决方案1 1 2019-10-20 09:40:16

解决方案1
1 2019-10-20 09:40:16