简体   繁体   English

Spark:删除 map 列的键

[英]Spark : remove keys of a map column

I have a dataset ds containing a map column userInfo that has 3 keys:我有一个数据集ds包含一个 map 列userInfo ,它有 3 个键:

{"name":"Tom", "age":"52", "phone":"45124"}

Now I want in my new dataset dsNew also have this map column (eg dsNew = ds.withColumn(...) ), but removes the key "phone".现在我想在我的新数据集中dsNew也有这个 map 列(例如dsNew = ds.withColumn(...) ),但删除键“电话”。 What is the best way performance-wise to do this?执行此操作的最佳方法是什么?

Note that this is only a simplified example, in reality this map column has many key-value pairs and I want to remove a subset of it.请注意,这只是一个简化的示例,实际上这个 map 列有许多键值对,我想删除它的一个子集。

Thanks.谢谢。

In addition to my linked answers which uses UDF, if your Spark version >= 3.0, you can also use map_filter :除了我使用 UDF 的链接答案之外,如果您的 Spark 版本 >= 3.0,您还可以使用map_filter

dsNew = ds.withColumn("userInfo", expr("map_filter(userInfo, (k, v) -> k != 'phone')"))

If you have a number of keys to remove, you can use not in :如果您有许多要删除的键,则可以使用not in

dsNew = ds.withColumn("userInfo", expr("map_filter(userInfo, (k, v) -> k not in ('phone', 'phone2'))"))

If you have only 3 keys it's simple to create new map by getting the 2 keys you want to keep:如果您只有 3 个密钥,则通过获取要保留的 2 个密钥来创建新的 map 很简单:

val dsNew = ds.withColumn(
  "userInfo",
  map(lit("name"), $"userInfo"("name"), lit("age"), $"userInfo"("age"))
)

dsNew.show
//+------------------------+
//|userInfo                |
//+------------------------+
//|[name -> Tom, age -> 52]|
//+------------------------+

Map filtering is only available since version 3 of spark as pointed in the other answer. Map 过滤仅在另一个答案中指出的 spark 版本 3 之后才可用。 In spark 2.4, you can get the keys and filter them using array functions then create new map with the filtered keys using map_from_arrays function:在 spark 2.4 中,您可以获取键并使用数组函数对其进行过滤,然后使用map_from_arrays function 使用过滤后的键创建新的 map:

val dsNew = ds.withColumn(
    "filtered_keys",
    expr("filter(map_keys(userInfo), x -> x <> 'phone')")
  ).withColumn(
    "userInfo",
    map_from_arrays(
      $"filtered_keys",
      expr("transform(filtered_keys, x -> userInfo[x])")
    )
  ).drop("filtered_keys")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM