简体   繁体   English

从 Spark DataFrame 列出的所有列的区别

[英]Distincts of all the columns to list from a Spark DataFrame

I have a data frame like below and I want to convert to expected format as distinct values in a list.我有一个如下所示的数据框,我想将其转换为预期格式作为列表中的不同值。

+---------------------+---------------+
|col1                 |col2           |
+---------------------+---------------+
|                  A  |             1 |
|                  B  |             2 |
|                  C  |             1 |
|                  D  |             1 |
|                  A  |             2 |
|               null  |             1 |
+---------------------+---------------+

Expected Format预期格式

+---------------------+---------------+
|col1                 |col2           |
+---------------------+---------------+
|      [A,B,C,D,null] |         [1,2] |
+---------------------+---------------+

Is there any wany to solve the above problem.??有什么办法可以解决上面的问题吗???

Thanks in Advance !!提前致谢 !!

You can do something like this你可以做这样的事情

import spark.implicits._

df
  .na.fill("null", Seq("col1"))
  .agg(
     func.collect_set($"col1").alias("col1"),
     func.collect_set($"col2").alias("col2")
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM