简体   繁体   中英

Pass date values from dataframe to query in Spark /Scala

I have a dataframe having a date column and has values as below.

df.show()
+----+----------+
|name|       dob|
+----+----------+
| Jon|2001-04-15|
| Ben|2002-03-01|
+----+----------+

Now i need to query a table in hive which has "dob" from above dataframe (both 2001-04-15, 2002-03-01).So I need to pass the values under dob column as a parameter to my hive query.

I tried to collect the values to a variable like below which give me array of string.

val dobRead = df.select("updt_d").distinct().as[String].collect()
dobRead: Array[String] = Array(2001-04-15, 2002-03-01)

However when i try to pass to the query i see its not substituting properly and get error.

val tableRead = hive.executeQuery(s"select emp_name,emp_no,martial_status from <<table_name>> where dateOfBirth in ($dobRead)")
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to compile query: org.apache.hadoop.hive.ql.parse.ParseException: line 1:480 cannot recognize input near '(' '[' 'Ljava' in expression specification

Can you please help me how to pass date values to a query in spark.

You can collect the dates as follows ( Row.getAs ):

val rows: Array[Row] = df.select("updt_d").distinct().collect()
val dates: Array[String] = rows.map(_.getAs[String](0))

And then build the query:

val hql: String = s"select ... where dateOfBirth in (${
  dates.map(d => s"'${d}'").mkString(", ")
})"

Option 2

If the number of dates in first DataFrame is too big, you should use join operations instead of collecting them into the driver.

First, load every table as DataFrames (I'll call them dfEmp and dfDates ). Then you can join on date fields to filter, either using a standard inner join plus filtering out null fields or using directly a left_semi join:

val dfEmp = hiveContext.table("EmpTable")
val dfEmpFiltered = dfEmp.join(dfDates,
  col("dateOfBirth") === col("updt_d"), "left_semi")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM