[英]how can i unpack a column of type list in pyspark
I have a dataframe in pyspark, the df has a column of type array string, so I need to generate a new column with the head of the list and also I need other columns with the concat of the tail list.我在pyspark中有一个数据框,df有一个数组字符串类型的列,所以我需要生成一个带有列表头部的新列,并且我还需要其他带有尾部列表连接的列。
This is my original dataframe:这是我的原始数据框:
pyspark> df.show()
+---+------------+
| id| lst_col|
+---+------------+
| 1|[a, b, c, d]|
+---+------------+
pyspark> df.printSchema()
root
|-- id: integer (nullable = false)
|-- lst_col: array (nullable = true)
| |-- element: string (containsNull = true)
And I need to generate something like this:我需要生成这样的东西:
pyspark> df2.show()
+---+--------+---------------+
| id|lst_head|lst_concat_tail|
+---+--------+---------------+
| 1| a| b,c,d|
+---+--------+---------------+
For Spark 2.4+, you can use element_at
, slice
and size
functions for arrays:对于 Spark 2.4+,您可以对数组使用
element_at
、 slice
和size
函数:
df.select("id",
element_at("lst_col", 1).alias("lst_head"),
expr("slice(lst_col, 2, size(lst_col))").alias("lst_concat_tail")
).show()
Gives:给出:
+---+--------+---------------+
| id|lst_head|lst_concat_tail|
+---+--------+---------------+
| 1| a| [b, c, d]|
+---+--------+---------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.