简体   繁体   中英

Get Hashtable/Map from Spark Dataframe Column stored as binary(serialized Hashtable) in SQL Server 2016 using Apache Spark 2.4

In one of the Legacy Application, In DB - SQL Server 2016, We have a Table - Measures

it has 15+ Columns, one of the column is binary

在此处输入图片说明

When I load into Spark and print the Schema, its binary

scala> jdbcDF.printSchema()
root
 |-- measurementValues: binary (nullable = true)
 |-- measure: string (nullable = true)

Looks like, they have used Hashtable , serialized it and stored into Table Column as binary


I am trying to de-serialize the same back into Hashtable (or) Map (or) Some Collection to able to convert into JSON Format while doing ETL Operations

Can anyone help here? I tried to convert the binary to string, still of no use :(

val convertToString = udf((a: Array[Byte])=> new String(a))

在此处输入图片说明

  def deserializeBinary = udf((x: Array[Byte]) => {
    val stream: InputStream  = new ByteArrayInputStream(x);
    val obs = new ObjectInputStream(stream)
    val stock = obs.readObject.asInstanceOf[util.Hashtable[String, String]]
    stock
  })

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM