[英]Create a map column in Apache Spark from other columns
I searched this quite a bit but cannot find anything that I can adapt to my situation. 我进行了相当多的搜索,但找不到任何可以适应我的情况的信息。 I have a dataframe like so:
我有一个像这样的数据框:
+-----------------+---------------+
| keys| values|
+-----------------+---------------+
|[one, two, three]|[101, 202, 303]|
+-----------------+---------------+
Keys have an array of strings, values has an array of ints. 键有一个字符串数组,值有一个整数数组。
I want to create a new column that contains a map of keys to values like so: 我想创建一个新列,其中包含一个键到值的映射,如下所示:
+-----------------+---------------+---------------------------+
| keys| values| map|
+-----------------+---------------+---------------------------+
|[one, two, three]|[101, 202, 303]|Map(one->101, two->202, etc|
+-----------------+---------------+---------------------------+
I've been looking at this question, but not sure it can be used as a starting point for my situation: Spark DataFrame columns transform to Map type and List of Map Type 我一直在研究此问题,但不确定是否可以将其用作我的情况的起点: Spark DataFrame列转换为Map类型和Map类型列表
I need this in Scala please. 我在斯卡拉需要这个。
Thanks! 谢谢!
you can create a similar udf to the one in the linked question: 您可以创建与链接的问题类似的udf:
val toMap = udf((keys: Seq[String], values: Seq[Int]) => {
keys.zip(values).toMap
})
and than use it as: 并将其用作:
df.withColumn("map", toMap($"keys", $"values"))
从Spark 2.4开始,有一个内置版本def map_from_arrays(keys: Column, values: Column): Column
org.apache.spark.sql.functions
def map_from_arrays(keys: Column, values: Column): Column
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.