I have a dataframe with a list of IDs. I would like to filter it down to just a set of IDs and I used .filter() to do it.
I'm running into this error.
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.HashSet$HashTrieSet
My code is pretty simple.
val setofID = Set("112", "113", "114", "121", "118", "120")
val my_dfFiltered = my_df.filter($"id".isin(setofID)).persist
Set
is not working with isin
, use a Seq
and use varags like
val setofID = Set("112", "113", "114", "121", "118", "120").toSeq
val my_dfFiltered = my_df.filter($"id".isin(setofID:_*)).persist
or using isInCollection
(since Spark 2.4) which accepts Iterable
, this should work directly with Set
val my_dfFiltered = my_df.filter($"id".isInCollection(setofID)).persist
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.