Filter Spark Dataframe with list of values in Scala

Question

I am trying to create a dataframe from hive table using SparkSession like below. Once created I am filtering the rows by a list of Ids.

val myDF = spark.sql("select * from myhivetable")
val someDF =  mfiDF.where(mfiDF("id").isin(myList:_*))

Instead of this approach is there a way I can query the hive table as below:

val myDF = spark.sql("select * from myhivetable").where (("id").isin(myList:_*))

When I try like this I am getting a compilation error.

Could someone suggest a best approach for this. Thanks.

Answer 1

You could also do an inner join to remove unwanted ids, something like below may work.

val ids = sc.parallelize(myList).toDF("id")
someDF.join(ids, ids.id === someDF.id)