简体   繁体   English

如何在 Pyspark 数据框中连接 2 列轴 = 1 上的 ArrayType?

[英]How to Concat 2 column of ArrayType on axis = 1 in Pyspark dataframe?

I have a the following dataframe:我有以下数据框:

I would like to concatenate the lat and lon into a list.我想将latlon连接成一个列表。 Where mmsi is similar to an ID (This is unique)其中mmsi类似于 ID(这是唯一的)

+---------+--------------------+--------------------+
|     mmsi|                 lat|                 lon|
+---------+--------------------+--------------------+
|255801480|[47.1018366666666...|[-5.3017783333333...|
|304182000|[44.6343033333333...|[-63.564803333333...|
|304682000|[41.1936, 41.1715...|[-8.7716, -8.7514...|
|305930000|[49.5221333333333...|[-3.6310166666666...|
|306216000|[42.8185133333333...|[-29.853155, -29....|
|477514400|[47.17205, 47.165...|[-58.6317, -58.60...|

Therefore, I would like to concatenate the lat and lon array but on axis = 1, that is, I would like to have at the end a list of lists, in a separate column, like:因此,我想将 lat 和 lon 数组连接起来,但在轴 = 1 上,也就是说,我想在最后有一个列表列表,在一个单独的列中,例如:

[[47.1018366666666, -5.3017783333333], ... ]

How is that could be possible in pyspark dataframe?在 pyspark 数据框中这怎么可能? I have tried concat, but that will return:我试过 concat,但它会返回:

[47.1018366666666, 44.6343033333333, ..., -5.3017783333333, -63.564803333333, ...]

Any help is much appreciated!任何帮助深表感谢!

Starting Spark version 2.4, you can use the inbuilt function arrays_zip .从 Spark 2.4 版开始,您可以使用内置函数arrays_zip

from pyspark.sql.functions import arrays_zip
df.withColumn('zipped_lat_lon',arrays_zip(df.lat,df.lon)).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在pyspark中将StringType列与ArrayType列的每个元素连接起来 - How to concat a StringType column with every element of an ArrayType column in pyspark 在pyspark中使用arraytype列创建数据框 - Create dataframe with arraytype column in pyspark 将带有 StringType 列表的 PySpark DataFrame 列转换为 ArrayType - Convert PySpark DataFrame column with list in StringType to ArrayType 如何在PySpark DataFrame中将ArrayType转换为DenseVector? - How to convert ArrayType to DenseVector in PySpark DataFrame? pySpark:如何在 dataframe 的 arrayType 列中获取 structType 中的所有元素名称? - pySpark: How can I get all element names in structType in arrayType column in a dataframe? 在 PySpark 中将 StringType 列转换为 ArrayType - Convert StringType Column To ArrayType In PySpark pyspark - 使用 ArrayType 列折叠和求和 - pyspark - fold and sum with ArrayType column 从一个 PySpark 数据框中获取 ArrayType 列并在另一个数据框中获取相应的值 - Take ArrayType column from one PySpark dataframe and get corresponding value in another dataframe 将 PySpark DataFrame ArrayType 字段合并为单个 ArrayType 字段 - Combine PySpark DataFrame ArrayType fields into single ArrayType field 将字符串转换为 ArrayType(DoubleType) pyspark dataframe - Casting string to ArrayType(DoubleType) pyspark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM