[英]How to remove key from map in spark data frame scala
I am working on data frame spark.我正在研究数据框火花。 There are 60 columns in my data frame and below is the sample map column of data frame.
我的数据框中有 60 列,下面是数据框的示例地图列。 Need to remove 'N/A' key from map.
需要从地图中删除“N/A”键。 I haven't find any function to do this
我还没有找到任何功能来做到这一点
+----------------------------------+
| userbytestsample |
+----------------------------------+
|[TEST -> 2000050008, N/A ->] |
+----------------------------------+
schema
|-- userbytestsample: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
output输出
+----------------------------------+
| userbytestsample |
+----------------------------------+
|[TEST -> 2000050008] |
+----------------------------------+
I suggest you use UDF我建议你使用UDF
Let's suppose i have this dataframe假设我有这个数据框
val df = Seq((Map("a" -> 1, "b" -> 2, "c" -> 3)), (Map("a" -> 10, "ff" -> 2, "gg" -> 30))).toDF("colmap")
scala> df.printSchema
root
|-- colmap: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = false)
df.show(false)
+----------------------------+
|colmap |
+----------------------------+
|[a -> 1, b -> 2, c -> 3] |
|[a -> 10, ff -> 2, gg -> 30]|
+----------------------------+
If i want to remove the key "a"如果我想删除键“a”
val unwantedKey : String = "a"
I create my UDF which will take the column 'colmap', remove the key and return the map without the key我创建了我的 UDF,它将采用列“colmap”,删除键并返回没有键的映射
def updateMap(unwantedKey : String) = udf((colMapName :Map[String, Int]) => {
colMapName.-(unwantedKey)
})
Finally, to apply this udf you can call it this way最后,要应用此 udf,您可以这样称呼它
val finalDF = df.withColumn("newcol", updateMap(unwantedKey)(col("colmap")))
finalDF.show(false)
+----------------------------+-------------------+
|colmap |newcol |
+----------------------------+-------------------+
|[a -> 1, b -> 2, c -> 3] |[b -> 2, c -> 3] |
|[a -> 10, ff -> 2, gg -> 30]|[ff -> 2, gg -> 30]|
+----------------------------+-------------------+
For Spark 3+ you can use map_filter
as对于Spark 3+,您可以使用
map_filter
作为
df.select(map_filter($"userbytestsample", (k, v) => !k.equalTo("N/A")).as("userbytestsample"))
.show(false)
Output:输出:
+----------------------------------+
| userbytestsample |
+----------------------------------+
|[TEST -> 2000050008] |
+----------------------------------+
For Spark 2.4+ you might need an udf
对于Spark 2.4+,您可能需要一个
udf
val map_filter_udf = udf{ (xs: Map[String, String]) => xs.filter(!_._1.equalsIgnoreCase(("N/A"))}
df.select(map_filter_udf($"userbytestsample"). as("userbytestsample"))
.show(false)
Output:输出:
+----------------------------------+
| userbytestsample |
+----------------------------------+
|[TEST -> 2000050008] |
+----------------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.