使用 Apache Spark 2.4 从在 SQL Server 2016 中存储为二进制（序列化哈希表）的 Spark 数据帧列获取哈希表/映射

Question

In one of the Legacy Application, In DB - SQL Server 2016, We have a Table - Measures在其中一个遗留应用程序中，在 DB - SQL Server 2016 中，我们有一个表 - 度量

it has 15+ Columns, one of the column is binary它有 15+ 列，其中一列是二进制的

When I load into Spark and print the Schema, its binary当我加载到 Spark 并打印架构时，它的二进制文件

scala> jdbcDF.printSchema()
root
 |-- measurementValues: binary (nullable = true)
 |-- measure: string (nullable = true)

Looks like, they have used Hashtable , serialized it and stored into Table Column as binary看起来，他们使用了Hashtable ，将其序列化并作为二进制存储到 Table Column 中

I am trying to de-serialize the same back into Hashtable (or) Map (or) Some Collection to able to convert into JSON Format while doing ETL Operations我正在尝试将相同的反序列化回哈希表（或）映射（或）某些集合，以便在执行 ETL 操作时能够转换为 JSON 格式

Can anyone help here?有人可以帮忙吗？ I tried to convert the binary to string, still of no use :(我试图将二进制文件转换为字符串，但仍然没有用:(

val convertToString = udf((a: Array[Byte])=> new String(a))

Answer 1

  def deserializeBinary = udf((x: Array[Byte]) => {
    val stream: InputStream  = new ByteArrayInputStream(x);
    val obs = new ObjectInputStream(stream)
    val stock = obs.readObject.asInstanceOf[util.Hashtable[String, String]]
    stock
  })

使用 Apache Spark 2.4 从在 SQL Server 2016 中存储为二进制（序列化哈希表）的 Spark 数据帧列获取哈希表/映射

问题描述

1 个解决方案

解决方案1
0 2020-01-31 03:05:27

使用 Apache Spark 2.4 从在 SQL Server 2016 中存储为二进制（序列化哈希表）的 Spark 数据帧列获取哈希表/映射

问题描述

1 个解决方案

解决方案1 0 2020-01-31 03:05:27

解决方案1
0 2020-01-31 03:05:27