I have a dataframe with following data
+-----------+-------|-----|
|file_name | key |Value|
+-----------+-------+-----+
| file1 | key1 | 7 |
| file1 | key2 | 11 |
| file1 | key3 | 3 |
| file2 | key1 | 9 |
| file2 | key2 | 2 |
| file2 | key3 | 10 |
+-----------+-------+-----+
With following code I have solved one step of my problem
dataset.select(col("file_name"), to_json(struct(col("key").alias("key"),col("value").alias("value"))).alias("output"))
.groupBy(col("file_name")).agg(collect_list(col("output")).alias("output"))
.show(false);
Which is giving me output like this -
+-----------+-------------------------------------------------------------------------------------|
|file_name | output |
+-----------+-------------------------------------------------------------------------------------|
| file1 |[{"key":"key1","value":"7"}, {"key":"key2","value":"11"}, {"key":"key3","value":"3"}]|
| file2 |[{"key":"key1","value":"9"}, {"key":"key2","value":"2"}, {"key":"key3","value":"10"}]|
+-----------+-------------------------------------------------------------------------------------|
But I want my final output in following json structure. Can you please suggest me any changes to get the output in following format (json object holding json array).
+-----------+----------------------------------------------------------------------------------------------|
|file_name | output |
+-----------+----------------------------------------------------------------------------------------------|
| file1 |{"result":[{"key":"key1","value":"7"},{"key":"key2","value":"11"},{"key":"key3","value":"3"}]}|
| file2 |{"result":[{"key":"key1","value":"9"},{"key":"key2","value":"2"},{"key":"key3","value":"10"}]}|
+-----------+----------------------------------------------------------------------------------------------|
Try adding another select
statement: select(col("file_name"), to_json(struct(col("output").alias("result"))).alias("output"))
The code should be something like:
dataset.select(col("file_name"), to_json(struct(col("key").alias("key"),col("value").alias("value"))).alias("output"))
.groupBy(col("file_name")).agg(collect_list(col("output")).alias("output"))
.select(col("file_name"), to_json(struct(col("output").alias("result"))).alias("output"))
.show(false);
You can put the result inside a struct before calling to_json
. Note that you shouldn't call to_json
twice because that will result in doubly escaped quotes.
dataset.groupBy("file_name").agg(
to_json(
struct(
collect_list(struct("key", "value")).alias("result")
)
).alias("output")
).show(false)
+---------+----------------------------------------------------------------------------------------------+
|file_name|output |
+---------+----------------------------------------------------------------------------------------------+
|file2 |{"result":[{"key":"key1","value":"9"},{"key":"key2","value":"2"},{"key":"key3","value":"10"}]}|
|file1 |{"result":[{"key":"key1","value":"7"},{"key":"key2","value":"11"},{"key":"key3","value":"3"}]}|
+---------+----------------------------------------------------------------------------------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.