I have the below dataset. Column_1
is comma-separated and Column_2
and Column_3
are separated by Colon. All are string columns. Every comma-separated value from Column_1
should be a separate row in Column_1
and the equivalent values from Column_2
or Column_3
should be populated. Either column_2
or column_3
will be populated and both will not be populated at the same time.
If the number of values in Column_1
doesn't match with the number of equivalent values in column_2
or column_3
then we have to populate null ( Column_1
: I,J
and K,L
)
Column_1 Column_2 Column_3
A,B,C,D NULL N1:N2:N3:N4
E,F N5:N6 NULL
G NULL N7
H NULL NULL
I,J NULL N8
K,L N9 NULL
I have to convert the delimited values into rows as below.
Column_1 Column_2
A N1
B N2
C N3
D N4
E N5
F N6
G N7
H NULL
I N8
J NULL
K N9
L NULL
Is there a way to achieve this in Java spark API without using UDF's.
Scala solution... should be similar in Java. You can combine columns 2 and 3 using coalesce
, split them with the appropriate delimiter, use arrays_zip
to transpose, and explode
the results into rows.
df.select(
explode(
arrays_zip(
split(col("Column_1"), ","),
coalesce(
split(coalesce(col("Column_2"), col("Column_3")), ":"),
array()
)
)
).alias("result")
).select(
"result.*"
).toDF(
"Column_1", "Column_2"
).show
+--------+--------+
|Column_1|Column_2|
+--------+--------+
| A| N1|
| B| N2|
| C| N3|
| D| N4|
| E| N5|
| F| N6|
| G| N7|
| H| null|
| I| N8|
| J| null|
| K| N9|
| L| null|
+--------+--------+
Here's another way, using transform
function you can iterate over element of column_1
and create map that you explode later:
df.withColumn(
"mappings",
split(coalesce(col("Column_2"), col("Column_3")), ":")
).selectExpr(
"explode(transform(split(Column_1, ','), (x, i) -> map(x, mappings[i]))) as mappings"
).selectExpr(
"explode(mappings) as (Column_1, Column_2)"
).show()
//+--------+--------+
//|Column_1|Column_2|
//+--------+--------+
//| A| N1|
//| B| N2|
//| C| N3|
//| D| N4|
//| E| N5|
//| F| N6|
//| G| N7|
//| H| null|
//| I| N8|
//| J| null|
//| K| N9|
//| L| null|
//+--------+--------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.