PySpark: How to Split a Column into 3 Columns

Question

I have a spark data frame as below and would like to split the the column into 3 by space.

+------------+
|        text|
+------------+
|  aaa bb ccc|
+------------+
|  aaa bb c d|
+------------+
|        aa b|
+------------+

Below is the expected outcome. The first item stays in text1 column, second item goes to text2 and the rest all go to text3 if any. The original column value could have null records or values with any numbers of splitter, which is the space, " ".

+------------+-----+-----+-----+
|        text|text1|text2|text3|
+------------+-----+-----+-----+
|  aaa bb ccc| aaa | bb  | ccc |
+------------+-----+-----+-----+
|  aaa bb c d| aaa | bb  | c d |
+------------+-----+-----+-----+
|        aa b| aa  | b   | null|
+------------+-----+-----+-----+
|        aa  | aa  |null | null|
+------------+-----+-----+-----+
|            | null|null | null|
+------------+-----+-----+-----+

Thanks in advance!

Answer 1

You can use the split function.

arr_cols = [F.split('text', ' ', 3)[i].alias('text' + str(i+1)) for i in range(3)]
df = df.select('text', *arr_cols)
df.show(truncate=False)

PySpark: How to Split a Column into 3 Columns

Question

1 answers

solution1
0 2021-11-14 06:48:28

PySpark: How to Split a Column into 3 Columns

Question

1 answers

solution1 0 2021-11-14 06:48:28

solution1
0 2021-11-14 06:48:28