[英]Spark : remove keys of a map column
I have a dataset ds
containing a map column userInfo
that has 3 keys:我有一个数据集ds
包含一个 map 列userInfo
,它有 3 个键:
{"name":"Tom", "age":"52", "phone":"45124"}
Now I want in my new dataset dsNew
also have this map column (eg dsNew = ds.withColumn(...)
), but removes the key "phone".现在我想在我的新数据集中dsNew
也有这个 map 列(例如dsNew = ds.withColumn(...)
),但删除键“电话”。 What is the best way performance-wise to do this?执行此操作的最佳方法是什么?
Note that this is only a simplified example, in reality this map column has many key-value pairs and I want to remove a subset of it.请注意,这只是一个简化的示例,实际上这个 map 列有许多键值对,我想删除它的一个子集。
Thanks.谢谢。
In addition to my linked answers which uses UDF, if your Spark version >= 3.0, you can also use map_filter
:除了我使用 UDF 的链接答案之外,如果您的 Spark 版本 >= 3.0,您还可以使用map_filter
:
dsNew = ds.withColumn("userInfo", expr("map_filter(userInfo, (k, v) -> k != 'phone')"))
If you have a number of keys to remove, you can use not in
:如果您有许多要删除的键,则可以使用not in
:
dsNew = ds.withColumn("userInfo", expr("map_filter(userInfo, (k, v) -> k not in ('phone', 'phone2'))"))
If you have only 3 keys it's simple to create new map by getting the 2 keys you want to keep:如果您只有 3 个密钥,则通过获取要保留的 2 个密钥来创建新的 map 很简单:
val dsNew = ds.withColumn(
"userInfo",
map(lit("name"), $"userInfo"("name"), lit("age"), $"userInfo"("age"))
)
dsNew.show
//+------------------------+
//|userInfo |
//+------------------------+
//|[name -> Tom, age -> 52]|
//+------------------------+
Map filtering is only available since version 3 of spark as pointed in the other answer. Map 过滤仅在另一个答案中指出的 spark 版本 3 之后才可用。 In spark 2.4, you can get the keys and filter them using array functions then create new map with the filtered keys using map_from_arrays
function:在 spark 2.4 中,您可以获取键并使用数组函数对其进行过滤,然后使用map_from_arrays
function 使用过滤后的键创建新的 map:
val dsNew = ds.withColumn(
"filtered_keys",
expr("filter(map_keys(userInfo), x -> x <> 'phone')")
).withColumn(
"userInfo",
map_from_arrays(
$"filtered_keys",
expr("transform(filtered_keys, x -> userInfo[x])")
)
).drop("filtered_keys")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.