I have a Spark DataFrame
with below columns
uuid|some_data
"A" |"ABC"
"B" |"DEF"
I need to convert this into a nested JSON of below format,
{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}
{"data":[{"attributes":[{"uuid":"B","some_data":"DEF"}]}]}
I tried the below code to achieve this,
val jsonDF= dataFrame.select(
to_json(struct(dataFrame.columns.map(column):_*)).alias("attributes")
)
val jsonDF2= jsonDF.select(
to_json(struct(jsonDF(column):_*)).alias("data")
)
val jsonDF3= jsonDF2(
to_json(struct(jsonDF2.columns.map(column):_*)).alias("value")
).selectExpr("CAST(value as STRING)")
Ended up getting the below format,
{"data": {"attributes": {"uuid":"A","some_data":"ABC}}}
{"data": {"attributes": {"uuid":"B","some_data":"DEF}}}
Please let me know what changes I need to make to get it to the required format.
Each JSON document requires its own struct
. Additionally you will need an array
to wrap the data
and another one to wrap the attributes
:
import org.apache.spark.sql.functions.{array, struct}
val jsonData = struct( // Outermost JSON document
array( // Data field as an array
struct( // Intermediate JSON document with attributes field
array( // Innermost array
struct( // Innermost JSON document
$"uuid", // Payload, you can use df.columns.map(col):_* instead
$"some_data"
)
) as "attributes" // Alias for innermost array field
)
) as "data") // Alias for array field
Combined:
Seq(("A", "ABC"))
.toDF("uuid", "some_data")
.select(to_json(jsonData) as "data")
.show(false)
+----------------------------------------------------------+
|data |
+----------------------------------------------------------+
|{"data":[{"attributes":[{"uuid":"A","some_data":"ABC"}]}]}|
+----------------------------------------------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.