简体   繁体   English

Json使用Spark分析包含包含Struct的Struct的数组

[英]Json Parsing with Array containing Struct of Struct using Spark

my json: (stored in minijson.json) 我的json :(存储在minijson.json中)

{
  "arr": [
    {
      "st1": {},
      "st2": {
        "a": {}
      },
      "val": 0.0,
      "x": "1"
    }
  ]
}

i am using spark version 2.1.0 to read json. 我正在使用Spark版本2.1.0读取json。

read code: 读取代码:

minidf = spark.read.json("minijson.json")
minidf.printSchema()

output: 输出:

root
 |-- arr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- val: double (nullable = true)
 |    |    |-- x: string (nullable = true)

I do not understand why it is not able to detect st1 , st2 and a fields which are present in json. 我不明白为什么它不能检测到st1st2和json中存在a字段。

Please help to solve this problem. 请帮助解决此问题。

Spark does not creates the Column if the key has empty key as "abc": {} It must have at-least a key and value to create a column. 如果键的空键为"abc": {} Spark不会创建该列"abc": {}它必须至少具有一个键和值才能创建列。

Below is a simple example by adding a key and value 以下是添加键和值的简单示例

{
"arr": [
  {
    "st1": {
    "name": ""
  },
    "st2": {
      "a": {
        "abc": ""
      }
    },
    "val": 0.0,
    "x": "1"}
  ]
}

Wich will have a schema as 威奇将有一个架构为

root
 |-- st1: struct (nullable = true)
 |    |-- name: string (nullable = true)
 |-- st2: struct (nullable = true)
 |    |-- a: struct (nullable = true)
 |    |    |-- abc: string (nullable = true)
 |-- val: double (nullable = true)
 |-- x: string (nullable = true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM