Extract JSON object column to multiple columns in Spark using Java

Question

I'm using Java 11 with Spark 3.3.0. Let's say I have a DataFrame with two columns, id (a string identifier) and properties (a string containing a single JSON object, eg {"foo":123,"bar":"test"} ).

How can I "explode" all the name-value pairs of the properties object into multiple columns, eg id , foo , bar .

I've seen solutions using select() . But then I would have to take care to select the existing columns, wouldn't I? Is there a simpler way to simply add additional columns as needed in one command?

I know I can add a single column at a time if I already know what properties to expect, like this:

df = df.withColumn("foo", get_json_object(col("properties"), "$.foo"))
df = df.withColumn("bar", get_json_object(col("properties"), "$.bar"))

Is there a way to add all the columns necessary for the properties JSON object in the properties column in one fell swoop, even without knowing ahead of time what properties will be in each JSON object?

Answer 1

You can add dynamically the properties of your JSON object as Dataset columns in Java 11 and Spark 3.3.0 as below:

ds = ds.select("*", "properties.*");

Input:

+---+---+------------+
|id |col|properties  |
+---+---+------------+
|1  |foo|{123, null} |
|2  |bar|{456, test2}|
+---+---+------------+

Output:

+---+---+------------+---+-----+
| id|col|  properties|foo|  bar|
+---+---+------------+---+-----+
|  1|foo| {123, null}|123| null|
|  2|bar|{456, test2}|456|test2|
+---+---+------------+---+-----+

Hope this is what you need, good luck!

Extract JSON object column to multiple columns in Spark using Java

Question

1 answers

solution1
0 2022-08-25 20:22:55

Extract JSON object column to multiple columns in Spark using Java

Question

1 answers

solution1 0 2022-08-25 20:22:55

solution1
0 2022-08-25 20:22:55