简体   繁体   中英

split Json array into two rows spark scala

I have a dataframe like this:

root
 |-- runKeyId: string (nullable = true)
 |-- entities: string (nullable = true)
+--------+--------------------------------------------------------------------------------------------+ 
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+ 
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339},{"Partition":{"Name":"DDD"},"id":339}|

and I would like to explode into this with scala:

+--------+--------------------------------------------------------------------------------------------+
|runKeyId|entities                                                                                    |
+--------+--------------------------------------------------------------------------------------------+
|1       |{"Partition":[{"Name":"ABC"},{"Name":"DBC"}],"id":339}
+--------+--------------------------------------------------------------------------------------------+
|2       |{"Partition":{"Name":"DDD"},"id":339}
+--------+--------------------------------------------------------------------------------------------+

Looks like you don't have a valid JSON, So fix the JSON first and then you can read as JSON and explode it as below.

val df = Seq(
  ("1", "{\"Partition\":[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}],\"id\":339},{\"Partition\":{\"Name\":\"DDD\"},\"id\":339}")
).toDF("runKeyId", "entities")
  .withColumn("entities", concat(lit("["), $"entities", lit("]"))) //fix the json 


val resultDF = df.withColumn("entities",
  explode(from_json($"entities", schema_of_json(df.select($"entities").first().getString(0))))
).withColumn("entities", to_json($"entities"))


resultDF.show(false)

Output:

+--------+----------------------------------------------------------------+
|runKeyId|entities                                                        |
+--------+----------------------------------------------------------------+
|1       |{"Partition":"[{\"Name\":\"ABC\"},{\"Name\":\"DBC\"}]","id":339}|
|1       |{"Partition":"{\"Name\":\"DDD\"}","id":339}                     |
+--------+----------------------------------------------------------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM