使用带有列名列表的 Spark DataFrame 过滤器

Question

I have to filter non-null column values in a Spark DataFrame using a List[String] :我必须使用List[String]过滤 Spark DataFrame 中的非空列值：

val keyList = List("columnA", "columnB", "columnC", "columnD", ...)

For a single column named key , the syntax should be:对于名为key的单个列，语法应为：

val nonNullDf = df.filter(col("key").isNotNull)

My question is how to use the keyList into the previous filter?我的问题是如何使用keyList进入前一个过滤器？

Answer 1

You can generate a filter by doing a map-reduce on keyList .您可以通过在keyList上执行 map-reduce 来生成过滤器。

Use and if you want to keep the rows where all columns are not null, or use or if you want to keep the rows where any column is not null.如果要保留所有列都不是 null 的行，请使用and ， or如果要保留任何列不是 null 的行，请使用或。

val nonNullDf = df.filter(keyList.map(col(_).isNotNull).reduce(_ and _))

使用带有列名列表的 Spark DataFrame 过滤器

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-19 14:59:24

使用带有列名列表的 Spark DataFrame 过滤器

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-19 14:59:24

解决方案1
1 已采纳 2021-04-19 14:59:24