简体   繁体   English

在Apache Spark中从其他列创建地图列

[英]Create a map column in Apache Spark from other columns

I searched this quite a bit but cannot find anything that I can adapt to my situation. 我进行了相当多的搜索,但找不到任何可以适应我的情况的信息。 I have a dataframe like so: 我有一个像这样的数据框:

+-----------------+---------------+
|             keys|         values|
+-----------------+---------------+
|[one, two, three]|[101, 202, 303]|
+-----------------+---------------+

Keys have an array of strings, values has an array of ints. 键有一个字符串数组,值有一个整数数组。

I want to create a new column that contains a map of keys to values like so: 我想创建一个新列,其中包含一个键到值的映射,如下所示:

+-----------------+---------------+---------------------------+
|             keys|         values|                        map|
+-----------------+---------------+---------------------------+
|[one, two, three]|[101, 202, 303]|Map(one->101, two->202, etc|
+-----------------+---------------+---------------------------+

I've been looking at this question, but not sure it can be used as a starting point for my situation: Spark DataFrame columns transform to Map type and List of Map Type 我一直在研究此问题,但不确定是否可以将其用作我的情况的起点: Spark DataFrame列转换为Map类型和Map类型列表

I need this in Scala please. 我在斯卡拉需要这个。

Thanks! 谢谢!

you can create a similar udf to the one in the linked question: 您可以创建与链接的问题类似的udf:

 val toMap = udf((keys: Seq[String], values: Seq[Int]) => {
    keys.zip(values).toMap
  })

and than use it as: 并将其用作:

df.withColumn("map", toMap($"keys", $"values"))

从Spark 2.4开始,有一个内置版本def map_from_arrays(keys: Column, values: Column): Column org.apache.spark.sql.functions def map_from_arrays(keys: Column, values: Column): Column

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Spark - 如何为数据框中的每一列创建不同的列? - Apache Spark - how to create difference columns for every column in dataframe? 从 Scala Spark 中其他列的值创建新列 - Create new columns from values of other columns in Scala Spark Spark scala - 有条件地从其他列添加 json 列 - Spark scala - add a json column from other columns conditionally Spark scala 从数组列创建多个列 - Spark scala create multiple columns from array column [Scala][Spark]:转换 dataframe 中的一列,保留其他列,使用 withColumn 和 map [错误:缺少参数类型] - [Scala][Spark]: transform a column in dataframe, keeping other columns, using withColumn and map [error: missing parameter type] 处理列值后,从其他列创建数组列 - Create an array column from other columns after processing the column values 如何将Spark Dataframe列嵌入到Map列? - How to embed spark dataframe columns to a map column? 在apache spark中选择不同的(在一列上)非空(在所有其他列上)来自Dataframe的值 - select distinct(On one column) not null (on all other column) value from Dataframe in apache spark 根据Apache Spark Scala中的列数据类型将数据框中的列选择为另一个数据框 - Select columns from a dataframe into another dataframe based on column datatype in Apache Spark Scala 使用 Apache Spark 2.4 从在 SQL Server 2016 中存储为二进制(序列化哈希表)的 Spark 数据帧列获取哈希表/映射 - Get Hashtable/Map from Spark Dataframe Column stored as binary(serialized Hashtable) in SQL Server 2016 using Apache Spark 2.4
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM