简体   繁体   中英

Jdbc data type to Spark SQL datatype

Need to write a method that will take in list of column names & list of column types (JDBC) & return a StructType which will be used to create a DataFrame.

I know I can write a method with a bunch of case statements to convert JDBC column type to appropriate DataType (such as StringType, IntegerType etc), but wondering if such a method already exists.

There's a DataType.fromJson method, but I don't know/understand the structure of the JSON I need to pass to it.

Example input:

List of column names: UserName, Age, Salary
List of column types: java.lang.String, java.lang.Long, java.lang.Double

If you have access to JDBC source with a table having a given schema you can simply copy from there:

val jdbcOptions: Map[String, String] = ???
val jdbcSchema = sqlContext.load("jdbc", jdbcOptions).schema

JSON representation is quite simple. Each StructField is represented as document with fields metadata , name , nullable and type .

{"metadata":{},"name":"f","nullable":true,"type":"string"}

For most applications you can ignore metadata and focus on the remaining three. Tricky part is mapping from Java class to type , but a naive solution can look like this:

import net.liftweb.json.JsonDSL._
import net.liftweb.json.{compact, render}

val columns = Seq(
    ("UserName", "java.lang.String"),
    ("Age", "java.lang.Long"),
    ("Salary", "java.lang.Double")
).map{case (n, t) => (n, t.split("\\.").last.toLowerCase)}

val fields =  columns.map {case (n, t) => (
    ("metadata" -> Map.empty[String, String]) ~
    ("name" -> n) ~
    ("nullable" -> false) ~
    ("type" -> t)
)}

val schemaJSON = compact(render(("fields" -> fields) ~ ("type" -> "struct"))
val schema = DataType.fromJson(schemaJSON).asInstanceOf[StructType]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM