简体   繁体   中英

Flattening the array of a dataframe column into separate columns and corresponding values in Spark scala

Below is my dataframe. Im getting this dataframe by converting the java Json list to dataframe.

+---+--------------------------------------------------------------------------------------------------+
|   |                              Value                                                              |
+---+---+----------------------------------------------------------------------------------------------+
|  1| {"Name":"john","type":"On(r) TV: Channel","desc_lang":"en~en~en~en","copyright":"Copyright 2021"}|
|-- --------------------------------------------------------------------------------------------------+
|  2| {"Name":"Dane","type":"On(r) TV: Prgrm","desc_lang":"FR~FR~FR~FR","copyright":"Copyright 2022"}  |
+------------------------------------------------------------------------------------------------------+

The required output is as below.

+---+---------------------+---+-------+----------------+
|Name|    type            | desc_lang | copyright      |
+----+--------------------+-----------+-------+--------+
|john| On(r) TV: Channel  | en~en~en~en |Copyright 2021|
|Dane| On(r) TV: Prgrm    | FR~FR~FR~FR |Copyright 2022|
+----+--------------------+-------------+--------------+

This is only the sample data, I actually have around 180 columns that needs to be flattened to the above tabular format. Below is the code I tried splitting, but it did not given me the desired output.

val dfcollect = DF.withColumn("finalop", split($"Value", ":"))

Could someone please assist on how to achieve this output.

You can use from_json and star expand the resulting struct:

val df2 = df.select(
    from_json(
        col("Value"), 
        schema_of_json(df.select("Value").head().getString(0))
    ).as("Value")
).select("Value.*")

df2.show
+----+-------------+-----------+---------------+
|Name|    copyright|  desc_lang|           type|
+----+-------------+-----------+---------------+
|john|Copyright2021|en~en~en~en|On(r)TV:Channel|
|Dane|Copyright2022|FR~FR~FR~FR|  On(r)TV:Prgrm|
+----+-------------+-----------+---------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM