[英]Distincts of all the columns to list from a Spark DataFrame
I have a data frame like below and I want to convert to expected format as distinct values in a list.我有一个如下所示的数据框,我想将其转换为预期格式作为列表中的不同值。
+---------------------+---------------+
|col1 |col2 |
+---------------------+---------------+
| A | 1 |
| B | 2 |
| C | 1 |
| D | 1 |
| A | 2 |
| null | 1 |
+---------------------+---------------+
Expected Format预期格式
+---------------------+---------------+
|col1 |col2 |
+---------------------+---------------+
| [A,B,C,D,null] | [1,2] |
+---------------------+---------------+
Is there any wany to solve the above problem.??有什么办法可以解决上面的问题吗???
Thanks in Advance !!提前致谢 !!
You can do something like this你可以做这样的事情
import spark.implicits._
df
.na.fill("null", Seq("col1"))
.agg(
func.collect_set($"col1").alias("col1"),
func.collect_set($"col2").alias("col2")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.