简体   繁体   English

如何从火花数据帧scala中的地图中删除键

[英]How to remove key from map in spark data frame scala

I am working on data frame spark.我正在研究数据框火花。 There are 60 columns in my data frame and below is the sample map column of data frame.我的数据框中有 60 列,下面是数据框的示例地图列。 Need to remove 'N/A' key from map.需要从地图中删除“N/A”键。 I haven't find any function to do this我还没有找到任何功能来做到这一点

+----------------------------------+
| userbytestsample                 |
+----------------------------------+
|[TEST  -> 2000050008, N/A ->]     |
+----------------------------------+


schema  
 |-- userbytestsample: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

output输出

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

I suggest you use UDF我建议你使用UDF

Let's suppose i have this dataframe假设我有这个数据框

val df = Seq((Map("a" -> 1, "b" -> 2, "c" -> 3)), (Map("a" -> 10, "ff" -> 2, "gg" -> 30))).toDF("colmap")

scala> df.printSchema
root
 |-- colmap: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = false)

df.show(false)
+----------------------------+
|colmap                      |
+----------------------------+
|[a -> 1, b -> 2, c -> 3]    |
|[a -> 10, ff -> 2, gg -> 30]|
+----------------------------+

If i want to remove the key "a"如果我想删除键“a”

val unwantedKey : String = "a"

I create my UDF which will take the column 'colmap', remove the key and return the map without the key我创建了我的 UDF,它将采用列“colmap”,删除键并返回没有键的映射

def updateMap(unwantedKey : String) = udf((colMapName :Map[String, Int]) => {
  colMapName.-(unwantedKey)
})

Finally, to apply this udf you can call it this way最后,要应用此 udf,您可以这样称呼它

val finalDF = df.withColumn("newcol", updateMap(unwantedKey)(col("colmap")))
finalDF.show(false)
+----------------------------+-------------------+
|colmap                      |newcol             |
+----------------------------+-------------------+
|[a -> 1, b -> 2, c -> 3]    |[b -> 2, c -> 3]   |
|[a -> 10, ff -> 2, gg -> 30]|[ff -> 2, gg -> 30]|
+----------------------------+-------------------+

For Spark 3+ you can use map_filter as对于Spark 3+,您可以使用map_filter作为

df.select(map_filter($"userbytestsample", (k, v) => !k.equalTo("N/A")).as("userbytestsample"))
  .show(false)

Output:输出:

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

For Spark 2.4+ you might need an udf对于Spark 2.4+,您可能需要一个udf

val map_filter_udf = udf{ (xs: Map[String, String]) => xs.filter(!_._1.equalsIgnoreCase(("N/A"))}

df.select(map_filter_udf($"userbytestsample"). as("userbytestsample"))
  .show(false)

Output:输出:

 +----------------------------------+
 | userbytestsample                 |
 +----------------------------------+
 |[TEST  -> 2000050008]             |
 +----------------------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM