简体   繁体   中英

Extract JSON object column to multiple columns in Spark using Java

I'm using Java 11 with Spark 3.3.0. Let's say I have a DataFrame with two columns, id (a string identifier) and properties (a string containing a single JSON object, eg {"foo":123,"bar":"test"} ).

How can I "explode" all the name-value pairs of the properties object into multiple columns, eg id , foo , bar .

I've seen solutions using select() . But then I would have to take care to select the existing columns, wouldn't I? Is there a simpler way to simply add additional columns as needed in one command?

I know I can add a single column at a time if I already know what properties to expect, like this:

df = df.withColumn("foo", get_json_object(col("properties"), "$.foo"))
df = df.withColumn("bar", get_json_object(col("properties"), "$.bar"))

Is there a way to add all the columns necessary for the properties JSON object in the properties column in one fell swoop, even without knowing ahead of time what properties will be in each JSON object?

You can add dynamically the properties of your JSON object as Dataset columns in Java 11 and Spark 3.3.0 as below:

ds = ds.select("*", "properties.*");

Input:

+---+---+------------+
|id |col|properties  |
+---+---+------------+
|1  |foo|{123, null} |
|2  |bar|{456, test2}|
+---+---+------------+

Output:

+---+---+------------+---+-----+
| id|col|  properties|foo|  bar|
+---+---+------------+---+-----+
|  1|foo| {123, null}|123| null|
|  2|bar|{456, test2}|456|test2|
+---+---+------------+---+-----+

Hope this is what you need, good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM