如何在 Spark Scala 中读取嵌套的 JSON？

Question

Here is my Nested JSON file.这是我的嵌套 JSON 文件。

{
"dc_id": "dc-101",
"source": {
    "sensor-igauge": {
      "id": 10,
      "ip": "68.28.91.22",
      "description": "Sensor attached to the container ceilings",
      "temp":35,
      "c02_level": 1475,
      "geo": {"lat":38.00, "long":97.00}                        
    },
    "sensor-ipad": {
      "id": 13,
      "ip": "67.185.72.1",
      "description": "Sensor ipad attached to carbon cylinders",
      "temp": 34,
      "c02_level": 1370,
      "geo": {"lat":47.41, "long":-122.00}
    },
    "sensor-inest": {
      "id": 8,
      "ip": "208.109.163.218",
      "description": "Sensor attached to the factory ceilings",
      "temp": 40,
      "c02_level": 1346,
      "geo": {"lat":33.61, "long":-111.89}
    },
    "sensor-istick": {
      "id": 5,
      "ip": "204.116.105.67",
      "description": "Sensor embedded in exhaust pipes in the ceilings",
      "temp": 40,
      "c02_level": 1574,
      "geo": {"lat":35.93, "long":-85.46}
    }
  }
}

How can I read the JSON file into Dataframe with Spark Scala.如何使用 Spark Scala 将 JSON 文件读入 Dataframe。 There is no array object in the JSON file, so I can't use explode. JSON文件中没有数组对象，所以不能使用explode。 Can anyone help?谁能帮忙？

Answer 1

val df = spark.read.option("multiline", true).json("data/test.json")

df
  .select(col("dc_id"), explode(array("source.*")) as "level1")
  .withColumn("id", col("level1.id"))
  .withColumn("ip", col("level1.ip"))
  .withColumn("temp", col("level1.temp"))
  .withColumn("description", col("level1.description"))
  .withColumn("c02_level", col("level1.c02_level"))
  .withColumn("lat", col("level1.geo.lat"))
  .withColumn("long", col("level1.geo.long"))
  .drop("level1")
  .show(false)

Sample Output:示例输出：

+------+---+---------------+----+------------------------------------------------+---------+-----+-------+
|dc_id |id |ip             |temp|description                                     |c02_level|lat  |long   |
+------+---+---------------+----+------------------------------------------------+---------+-----+-------+
|dc-101|10 |68.28.91.22    |35  |Sensor attached to the container ceilings       |1475     |38.0 |97.0   |
|dc-101|8  |208.109.163.218|40  |Sensor attached to the factory ceilings         |1346     |33.61|-111.89|
|dc-101|13 |67.185.72.1    |34  |Sensor ipad attached to carbon cylinders        |1370     |47.41|-122.0 |
|dc-101|5  |204.116.105.67 |40  |Sensor embedded in exhaust pipes in the ceilings|1574     |35.93|-85.46 |
+------+---+---------------+----+------------------------------------------------+---------+-----+-------+

Instead of selecting each column, you can try writing some generic UDF to get all the individual columns.您可以尝试编写一些通用 UDF 来获取所有单独的列，而不是选择每一列。

Note: Tested with Spark 2.3注意：使用 Spark 2.3 测试

Answer 2

Taken the string into a variable called jsonString将字符串放入一个名为 jsonString 的变量中

import org.apache.spark.sql._
import spark.implicits._
val df = spark.read.json(Seq(jsonString).toDS)
val df1 = df.withColumn("lat" ,explode(array("source.sensor-igauge.geo.lat")))

You can follow the same steps for other structures as well - map/ array structures您也可以对其他结构执行相同的步骤 - 映射/数组结构

Answer 3

val df = spark.read.option("multiline", true).json("myfile.json")

df.select($"dc_id", explode(array("source.*")))
.select($"dc_id", $"col.c02_level", $"col.description", $"col.geo.lat", $"col.geo.long", $"col.id", $"col.ip", $"col.temp")
.show(false)

Output:输出：

+------+---------+------------------------------------------------+-----+-------+---+---------------+----+
|dc_id |c02_level|description                                     |lat  |long   |id |ip             |temp|
+------+---------+------------------------------------------------+-----+-------+---+---------------+----+
|dc-101|1475     |Sensor attached to the container ceilings       |38.0 |97.0   |10 |68.28.91.22    |35  |
|dc-101|1346     |Sensor attached to the factory ceilings         |33.61|-111.89|8  |208.109.163.218|40  |
|dc-101|1370     |Sensor ipad attached to carbon cylinders        |47.41|-122.0 |13 |67.185.72.1    |34  |
|dc-101|1574     |Sensor embedded in exhaust pipes in the ceilings|35.93|-85.46 |5  |204.116.105.67 |40  |
+------+---------+------------------------------------------------+-----+-------+---+---------------+----+

如何在 Spark Scala 中读取嵌套的 JSON？

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-12-13 10:51:23

解决方案2
1 2018-12-13 09:18:02

解决方案3
0 2020-02-04 06:48:23

如何在 Spark Scala 中读取嵌套的 JSON？

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-12-13 10:51:23

解决方案2 1 2018-12-13 09:18:02

解决方案3 0 2020-02-04 06:48:23

解决方案1
3 已采纳 2018-12-13 10:51:23

解决方案2
1 2018-12-13 09:18:02

解决方案3
0 2020-02-04 06:48:23