简体   繁体   中英

Filter Spark Dataframe with list of values in Scala

I am trying to create a dataframe from hive table using SparkSession like below. Once created I am filtering the rows by a list of Ids.

val myDF = spark.sql("select * from myhivetable")
val someDF =  mfiDF.where(mfiDF("id").isin(myList:_*))

Instead of this approach is there a way I can query the hive table as below:

val myDF = spark.sql("select * from myhivetable").where (("id").isin(myList:_*))

When I try like this I am getting a compilation error.

Could someone suggest a best approach for this. Thanks.

You could also do an inner join to remove unwanted ids, something like below may work.

val ids = sc.parallelize(myList).toDF("id")
someDF.join(ids, ids.id === someDF.id)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM