简体   繁体   中英

How can we read invalid date column in spark scala from mysql server using jdbc driver url (connection)

I am getting error while reading this column from mysql server

id date
1 0000-00-00
2 0000-00-01

in the above data set we can handle 0000-00-00 by using mysql server Additional parameter zeroDateTimeBehavior=convertToNull

but i don't know how to handle this type of date 0000-00-01

help me error message i got

Exception in User Class: org.apache.spark.SparkException : Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 11) (10.100.4.111 executor 1): java.sql.SQLException: YEAR

i am using this

val a = "jdbc:mysql://<host_name>:3306/<database_name>?zeroDateTimeBehavior=convertToNull"

val mysqlServerDF = sparkSession.read.format("jdbc")
                .option("url", a)
                .option("query", sql)
                .option("user",jdbcUserName)
                .option("password", jdbcPassword)
                .load()

sql is a sql query example "select * from table"

If fixing such dates in your database is not an option, I think your best bet would be to handle it directly in your sql query. Eg we can compare for valid date range, which is '1000-01-01' to '9999-12-31' according to docs :

val sql = """
  select 
    id, 
    case 
      when 
        not cast(date as char(10)) 
        between '1000-01-02' and '9999-12-30' 
      then 
        null 
      else 
        date 
    end 
  from table1"""

val mysqlServerDF = sparkSession.read.format("jdbc")
                .option("url", a)
                .option("query", sql)
                .option("user",jdbcUserName)
                .option("password", jdbcPassword)
                .load()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM