[英]How do I filter rows based on whether a column value is in a Set of Strings in a Spark DataFrame
Is there a more elegant way of filtering based on values in a Set of String? 是否有更优雅的过滤方式基于一组字符串中的值?
def myFilter(actions: Set[String], myDF: DataFrame): DataFrame = {
val containsAction = udf((action: String) => {
actions.contains(action)
})
myDF.filter(containsAction('action))
}
In SQL you can do 在SQL中你可以做到
select * from myTable where action in ('action1', 'action2', 'action3')
How about this: 这个怎么样:
myDF.filter("action in (1,2)")
OR 要么
import org.apache.spark.sql.functions.lit
myDF.where($"action".in(Seq(1,2).map(lit(_)):_*))
OR 要么
import org.apache.spark.sql.functions.lit
myDF.where($"action".in(Seq(lit(1),lit(2)):_*))
Additional support will be added to make this cleaner in 1.5 将添加额外的支持,以使1.5更清洁
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.