简体   繁体   中英

Spark SQL Generating Wrong Upper and Lower Bounds for JDBC Queries

So i am currently working on a POC With Spark-SQL Where i need to parallelize the read operation using a spark-sql query in spark

 JavaRDD<Row> dataset = sqlContext.read().jdbc(jdBcConnectionString, getSqlQuery(), "tran_id"
                lowerbound, upperbound, partitions, props).toJavaRDD();

Every seems well and works fine untill you inspect the queries generated (Which in my own case is MS Sql Server).

The lower bound query is

exec sp_executesql N'SELECT * FROM table_name WHERE tran_id < 770425 or post_tran_id is null'

while the upperbound query becomes

exec sp_executesql N'SELECT * FROM table_name WHERE tran_id >= 770425'

One would think that the essence of specifying bounds is to get all rows where column value is between the specified lowerbound and upperbound. but this appears not to be the case

Please I am new to spark, is there another way to achieve this

One would think that the essence of specifying bounds is to get all rows where column value is between the specified lowerbound and upperbound.

It is not and conditions are correct. As usual it is better to read the documentation than assume:

Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned. This option applies only to reading.

But it looks like lowerBound is equal to upperBound in your case.

Please I am new to spark, is there another way to achieve this

If you want filter then apply where :

dataset.where(col("tran_id").between(lowerBound, upperBound))

or use subquery as table argument:

sqlContext.read().jdbc(
  jdBcConnectionString,
  "(SELECT * FROM table_name WHERE tran_id BETWEEN 0 AND 42) AS t", props);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM