使用Spark Scala解析JSON文件

Question

我有如下所示的JSON源數據文件，我將需要一種完全不同的格式的“ 預期結果” ，這也在下面顯示，有沒有一種方法可以使用Spark Scala實現。 感謝您的幫助

JSON源數據文件

{
  "APP": [
    {
      "E": 1566799999225,
      "V": 44.0
    },
    {
      "E": 1566800002758,
      "V": 61.0
    }
  ],
  "ASP": [
    {
      "E": 1566800009446,
      "V": 23.399999618530273
    }
  ],
  "TT": 0,
  "TVD": [
    {
      "E": 1566799964040,
      "V": 50876515
    }
  ],
  "VIN": "FU74HZ501740XXXXX"
}

預期成績：

JSON模式：

|-- APP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ASP: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- ATO: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- MSG_TYPE: string (nullable = true)
|-- RPM: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: double (nullable = true)
|-- TT: long (nullable = true)
|-- TVD: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- E: long (nullable = true)
|    |    |-- V: long (nullable = true)
|-- VIN: string (nullable = true)

Answer 1

您可以先閱讀json文件：

  val inputDataFrame: DataFrame = sparkSession
    .read
      .option("multiline", true)
      .json(yourJsonPath)

然后，您可以創建一個簡單的規則來獲取APP, ASP, ATO ，因為它是輸入中唯一具有struct數據類型的字段：

val inputDataFrameFields: Array[StructField] = inputDataFrame.schema.fields

  var snColumn = new Array[String](inputDataFrame.schema.length)

   for( x <- 0 to (inputDataFrame.schema.length -1)) {

    if(inputDataFrameFields.apply(x).dataType.isInstanceOf[ArrayType] && !inputDataFrameFields.apply(x).name.isEmpty) {
     snColumn(x) = inputDataFrameFields.apply(x).name
    }
  }

然后，按照以下步驟創建空數據框並填充它：

  val outputSchema = StructType(
    List(
      StructField("VIN", StringType, true),
      StructField(
        "EVENTS",
        ArrayType(
          StructType(Array(
            StructField("SN", StringType, true),
            StructField("E", IntegerType, true),
            StructField("V", DoubleType, true)
          )))),
      StructField("TT", StringType, true)
    )
  )

  val outputDataFrame = sparkSession.createDataFrame(sparkSession.sparkContext.emptyRDD[Row], outputSchema)

然后，您需要創建一些udf來解析您的輸入並進行正確的映射。

希望這可以幫助

Answer 2

這是將json解析為適合您的數據的spark數據框的解決方案：

    val input = "{\"APP\":[{\"E\":1566799999225,\"V\":44.0},{\"E\":1566800002758,\"V\":61.0}],\"ASP\":[{\"E\":1566800009446,\"V\":23.399999618530273}],\"TT\":0,\"TVD\":[{\"E\":1566799964040,\"V\":50876515}],\"VIN\":\"FU74HZ501740XXXXX\"}"

    import sparkSession.implicits._

    val outputDataFrame = sparkSession.read.option("multiline", true).option("mode","PERMISSIVE")
      .json(Seq(input).toDS)
        .withColumn("APP", explode(col("APP")))
      .withColumn("ASP", explode(col("ASP")))
      .withColumn("TVD", explode(col("TVD")))
        .select(
          col("VIN"),col("TT"),
          col("APP").getItem("E").as("APP_E"),
          col("APP").getItem("V").as("APP_V"),
          col("ASP").getItem("E").as("ASP_E"),
          col("ASP").getItem("V").as("ASP_E"),
          col("TVD").getItem("E").as("TVD_E"),
          col("TVD").getItem("V").as("TVD_E")
        )

    outputDataFrame.show(truncate = false)

    /*
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|VIN              |TT |APP_E        |APP_V|ASP_E        |ASP_E             |TVD_E        |TVD_E   |
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
|FU74HZ501740XXXXX|0  |1566799999225|44.0 |1566800009446|23.399999618530273|1566799964040|50876515|
|FU74HZ501740XXXXX|0  |1566800002758|61.0 |1566800009446|23.399999618530273|1566799964040|50876515|
+-----------------+---+-------------+-----+-------------+------------------+-------------+--------+
     */

使用Spark Scala解析JSON文件

問題描述

2 個解決方案

解決方案1
2 已采納 2019-09-18 16:12:26

解決方案2
0 2019-09-18 09:16:38

使用Spark Scala解析JSON文件

問題描述

2 個解決方案

解決方案1 2 已采納 2019-09-18 16:12:26

解決方案2 0 2019-09-18 09:16:38

解決方案1
2 已采納 2019-09-18 16:12:26

解決方案2
0 2019-09-18 09:16:38