简体   繁体   中英

how can i unpack a column of type list in pyspark

I have a dataframe in pyspark, the df has a column of type array string, so I need to generate a new column with the head of the list and also I need other columns with the concat of the tail list.

This is my original dataframe:

pyspark> df.show()
+---+------------+
| id|     lst_col|
+---+------------+
|  1|[a, b, c, d]|
+---+------------+


pyspark> df.printSchema()
root
 |-- id: integer (nullable = false)
 |-- lst_col: array (nullable = true)
 |    |-- element: string (containsNull = true)

And I need to generate something like this:

pyspark> df2.show()
+---+--------+---------------+
| id|lst_head|lst_concat_tail|
+---+--------+---------------+
|  1|       a|          b,c,d|
+---+--------+---------------+

For Spark 2.4+, you can use element_at , slice and size functions for arrays:

df.select("id",
          element_at("lst_col", 1).alias("lst_head"),
          expr("slice(lst_col, 2, size(lst_col))").alias("lst_concat_tail")
         ).show()

Gives:

+---+--------+---------------+
| id|lst_head|lst_concat_tail|
+---+--------+---------------+
|  1|       a|      [b, c, d]|
+---+--------+---------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM