简体   繁体   中英

How to filter a column in Spark dataframe using a Array of strings?

I have to filter a column in spark dataframe using a Array[String]

I have an parameter file like below,

variable1=100,200

I read the parameter file and split each row by "=" and load in to a Map[String,String] In order to get the value, I pass the key "varaible1" and split the value by ","

val value1:Array[String] = parameterValues("varaible1").split(",")

now I need to use this value1 while filtering a dataframe.

val finalDf = testDf.filter("column1 in ($value1) and column2 in ($value1)")

I'm getting the below error,

org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '(' expecting <EOF>(line 1, pos 12)

== SQL ==
column1 in ([Ljava.lang.String;@760b9444) and column2 in ([Ljava.lang.String;@5397d41e)
------------^^^

Any suggestions?

To filter a column by an array, you can use the isin column method:

import org.apache.spark.sql.functions.col

val finalDf = testDf.filter(col("column1").isin(value1: _*) && col("column2").isin(value1: _*))

You can filter a column using an array as you've done. To correct your SQL expression, you need to do two things.

First, you forgot to put the 's' string interpolator at the start of your string representing your SQL expression, as below:

s"column1 in ($value1) and column2 in ($value1)"

And then, you need to convert your Array[String] to a well formatted String that will be understood as an SQL array. To do so, you can use mkString method on your value1 array:

value1.mkString("'", "','","'")

On your array Array("100", "200") , this method would return the string "'100','200'"

If we wrap up everything, we get the following expression:

val finalDf = testDf.filter(s"column1 in (${value1.mkString("'", "','","'")}) and column2 in (${value1.mkString("'", "','","'")})")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM