简体   繁体   中英

Save Spark Dataframe into JSON

I have a data frame and I want to save in json. My data frame contains a json column. When I save the data frame into json it saves as string field instead of json field.

spark version : 2.4.0 language: scala

dataframe

+--------+---------------------------------+
|id      |  jsoncolumn                     |
+--------+---------------------------------+
|1000    | [{"A": 10}, {"A": 20, "B": 50}] |
+--------+---------------------------------+

when I use df.write.json("path")

I am getting below output. jsoncolumn saves as string

{
  "id": 1000,
  "jsoncolumn": "[{\"A\": 10}, {\"A\": 20, \"B\": 50}]"
}

expected output

{
  "id": 1000,
  "jsoncolumn": [
    {
      "A": 10
    },
    {
      "A": 20,
      "B": 50
    }
  ]
}

You can convert StringType to StructType before writing as below

val value = df.first().getAs[String]("jsonColumn")
df1.withColumn("jsonColumn", from_json($"jsonColumn", schema_of_json(value)))
  .write.json("output/test")

Output:

{
  "id": "1000",
  "jsonColumn": [
    {
      "A": 10
    },
    {
      "A": 20,
      "B": 50
    }
  ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM