简体   繁体   中英

How to write a condition based on multiple values for a DataFrame in Spark

I'm working on a Spark Application (using Scala) and I have a List which contains multiple values. I'd like to use this list in order to write a where clause for my DataFrame and select only a subset on tuples. For example, my List contains 'value1', 'value2', and 'value3'. and I would like to write something like this:

mydf.where($"col1" === "value1" || $"col1" === "value2" || $"col1" === "value3)

How can I do that programmatically cause the list contains many values?

You can map a list of values to a list of "filters" (with type Column ), and reduce this list into a single filter by applying the || operator on every two filters:

val possibleValues = Seq("value1", "value2", "value3")
val result = mydf.where(possibleValues.map($"col1" === _).reduce(_ || _))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM