简体   繁体   中英

Spark JSON Array

I have a Spark DataFrame with below columns

uuid|some_data
"A" |"ABC"
"B" |"DEF"

I need to convert this into a nested JSON of below format,

{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}

I tried the below code to achieve this,

val jsonDF= dataFrame.select(
  to_json(struct(dataFrame.columns.map(column):_*)).alias("attributes")
)

val jsonDF2= jsonDF.select(
  to_json(struct(jsonDF(column):_*)).alias("data")
)

val jsonDF3= jsonDF2(
  to_json(struct(jsonDF2.columns.map(column):_*)).alias("value")
).selectExpr("CAST(value as STRING)")

Ended up getting the below format,

{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}

Please let me know what changes I need to make to get it to the required format.

Each JSON document requires its own struct . Additionally you will need an array to wrap the data and another one to wrap the attributes :

import org.apache.spark.sql.functions.{array, struct}

val jsonData = struct(     // Outermost JSON document
  array(                   // Data field as an array
    struct(                // Intermediate JSON document with attributes field
      array(               // Innermost array
        struct(            // Innermost JSON document 
          $"uuid",         // Payload, you can use df.columns.map(col):_* instead
          $"some_data"
        )  
      ) as "attributes"    // Alias for innermost array field
    )
) as "data")               // Alias for array field

Combined:

Seq(("A", "ABC"))
  .toDF("uuid", "some_data")
  .select(to_json(jsonData) as "data")
  .show(false)
+----------------------------------------------------------+
|data                                                      |
+----------------------------------------------------------+
|{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}|
+----------------------------------------------------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM