简体   繁体   English

spark scala - 从 json jdbc 列中获取值

[英]spark scala - get a value from json jdbc column

In mysql jdbc data source which is used for data load into Spark there is a column which contains JSON in string.在用于将数据加载到 Spark 中的 mysql jdbc 数据源中,有一列包含字符串中的 JSON。

// JDBC Connection and load table in Dataframe
val verDf = spark.read.format("jdbc").option("driver", driver).option("url", url).option("dbtable", verticesTable).option("user", user).option("password", pass).load()
verDf.printSchema
root
 |-- id: integer (nullable = true)
 |-- url: string (nullable = true)
 |-- al: string (nullable = true) -->> this is JSON string
 |-- batch_id: integer (nullable = true)
 |-- x: double (nullable = true)
 |-- y: double (nullable = true)
 |-- z: double (nullable = true)
 |-- size: double (nullable = true)

JSON is in al column and only single value is required. JSON 在 al 列中,只需要单个值。 How can I extract it?我怎样才能提取它? I've seen from_json/get_json_schema approach and it looks expensive and bulky - schema should be created then JSON is unwrapped into Map etc.我已经看到 from_json/get_json_schema 方法,它看起来既昂贵又笨重 - 应该创建架构,然后将 JSON 解包到 Map 等中。

val schema = schema_of_json(lit(verDf.select($"al").as[String].first))

So when I run this line above it timeouts or runs for matter of minutes (4-8 minutes).因此,当我在上面运行这条线时,它会超时或运行几分钟(4-8 分钟)。

  • I don't understand why - it should take just first line and parse json to produce a schema of it.我不明白为什么 - 它应该只使用第一行并解析 json 以生成它的模式。 Why it is so long (json value there is about 1-2 kilobyte, really small object)?为什么这么长(json 值大约有 1-2 KB,非常小的对象)?
  • Is there any chance to use some function similar to mysql json_extract() to extract relatively fast value from a JSON string?有没有机会使用一些类似于 mysql json_extract()的 function 从 JSON 字符串中提取相对快速的值?

function get_json_object works exactly as required - it takes 2 arguments - column name and path: function get_json_object完全按照要求工作 - 它需要 2 个 arguments - 列名和路径:

val newsh=spark.sql("select get_json_object(verDf.al,'$.key1.paramName') from table where table.key=value")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM