如何从火花数据帧scala中的地图中删除键

Question

I am working on data frame spark.我正在研究数据框火花。 There are 60 columns in my data frame and below is the sample map column of data frame.我的数据框中有 60 列，下面是数据框的示例地图列。 Need to remove 'N/A' key from map.需要从地图中删除“N/A”键。 I haven't find any function to do this我还没有找到任何功能来做到这一点

+----------------------------------+
| userbytestsample                 |
+----------------------------------+
|[TEST  -> 2000050008, N/A ->]     |
+----------------------------------+


schema  
 |-- userbytestsample: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

output输出

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

Answer 1

I suggest you use UDF我建议你使用UDF

Let's suppose i have this dataframe假设我有这个数据框

val df = Seq((Map("a" -> 1, "b" -> 2, "c" -> 3)), (Map("a" -> 10, "ff" -> 2, "gg" -> 30))).toDF("colmap")

scala> df.printSchema
root
 |-- colmap: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = false)

df.show(false)
+----------------------------+
|colmap                      |
+----------------------------+
|[a -> 1, b -> 2, c -> 3]    |
|[a -> 10, ff -> 2, gg -> 30]|
+----------------------------+

If i want to remove the key "a"如果我想删除键“a”

val unwantedKey : String = "a"

I create my UDF which will take the column 'colmap', remove the key and return the map without the key我创建了我的 UDF，它将采用列“colmap”，删除键并返回没有键的映射

def updateMap(unwantedKey : String) = udf((colMapName :Map[String, Int]) => {
  colMapName.-(unwantedKey)
})

Finally, to apply this udf you can call it this way最后，要应用此 udf，您可以这样称呼它

val finalDF = df.withColumn("newcol", updateMap(unwantedKey)(col("colmap")))
finalDF.show(false)
+----------------------------+-------------------+
|colmap                      |newcol             |
+----------------------------+-------------------+
|[a -> 1, b -> 2, c -> 3]    |[b -> 2, c -> 3]   |
|[a -> 10, ff -> 2, gg -> 30]|[ff -> 2, gg -> 30]|
+----------------------------+-------------------+

Answer 2

For Spark 3+ you can use map_filter as对于Spark 3+，您可以使用map_filter作为

df.select(map_filter($"userbytestsample", (k, v) => !k.equalTo("N/A")).as("userbytestsample"))
  .show(false)

Output:输出：

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

For Spark 2.4+ you might need an udf对于Spark 2.4+，您可能需要一个udf

val map_filter_udf = udf{ (xs: Map[String, String]) => xs.filter(!_._1.equalsIgnoreCase(("N/A"))}

df.select(map_filter_udf($"userbytestsample"). as("userbytestsample"))
  .show(false)

Output:输出：

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

如何从火花数据帧scala中的地图中删除键

问题描述

2 个解决方案

解决方案1
1 2021-07-20 14:06:36

解决方案2
1 已采纳 2021-07-20 14:07:41

如何从火花数据帧scala中的地图中删除键

问题描述

2 个解决方案

解决方案1 1 2021-07-20 14:06:36

解决方案2 1 已采纳 2021-07-20 14:07:41

解决方案1
1 2021-07-20 14:06:36

解决方案2
1 已采纳 2021-07-20 14:07:41