简体   繁体   中英

Spark Explode a row containing multiple metrics using Map

I am using Spark with Java and have a dataframe like this:

id | sent  | delivered | opened
--------------------------------
1  | 5     | 3         | 2
2  | 11    | 9         | 4

I want to get something like:

id  | metric_name | metric_value
--------------------------------
1   | sent        |  5
1   | delivered   |  3
1   | opened      |  2
2   | sent        |  11
2   | delivered   |  9
2   | opened      |  4

This does the trick:

df = df.selectExpr(
  "id",
  "stack(3, 'sent', sent, 'delivered', delivered, 'opened', opened) as (metric_name, metric_value)"
)

Final output:

+---+-----------+------------+
|id |metric_name|metric_value|
+---+-----------+------------+
|1  |sent       |5           |
|1  |delivered  |3           |
|1  |opened     |2           |
|2  |sent       |11          |
|2  |delivered  |9           |
|2  |opened     |4           |
+---+-----------+------------+

Good luck!

Create a struct columns combining column names and values of the columns you want to melt. Explode using inline

 df1 =df.withColumn('tab',F.array(*[F.struct(F.lit(x).alias('metric_name'), F.col(x).alias('metric_value ')) for x in df.columns if x != 'id'])).selectExpr('id','inline(tab)')

df1.show(truncate=False)

+---+-----------+-------------+
|id |metric_name|metric_value |
+---+-----------+-------------+
|1  |sent       |5            |
|1  |delivered  |3            |
|1  |opened     |2            |
|2  |sent       |11           |
|2  |delivered  |9            |
|2  |opened     |4            |
+---+-----------+-------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM