[英]Spark/Scala transform [map of array] to [map of map]
I am looking to change the way data is stored in one of my dataframe's column.我希望更改数据存储在我的数据框列之一中的方式。 The columnn "content-value" has currently this type:
“content-value”列目前有这种类型:
|-- content-value: map (nullable = true)
| |-- key: integer
| |-- value: array (valueContainsNull = true)
| | |-- element: string (containsNull = true)
And the data is currently stored like that:数据目前是这样存储的:
{4 -> [5191, 57, -46, POS2], 5 -> [5413, 56, 48, POS2], 2 -> [5421, -59, 47, POS2], 1 -> [5237, -59, -47, POS2], 3 -> [5153, -10, 42, POS1]}
I would like to change that to a map of map that would look like:我想将其更改为 map 的 map,它看起来像:
{4 -> {value -> 5191, x -> 57, y -> -46, pos -> POS2}, 5 -> {value -> 5413, x -> 56, y -> 48, pos -> POS2}, 2 -> {value -> 5421, x -> -59, y -> 47, pos -> POS2}, 1 -> {value -> 5237, x -> -59, y -> -47, pos -> POS2}, 3 -> {value -> 5153, x -> -10, y -> 42, pos -> POS1}}
I've tried creating a new column with the keys ["value", "x", "y", "pos"]
and using map_from_array without success.我尝试使用键
["value", "x", "y", "pos"]
创建一个新列并使用 map_from_array 但没有成功。
Would love some help !会喜欢一些帮助!
With dataset:使用数据集:
import spark.implicits._
case class Value(value: String, x: String, y: String, pos: String)
val ds = spark.createDataset[Map[Int, Array[String]]](Seq(Map(4 -> Array("5191", "57", "-46", "POS2"))))
val dsFinal =
ds.map(el => el.flatMap {
case (key, value) => Map(key -> Value(value(0), value(1), value(2), value(3)))})
It gives:它给:
+----------------------------+
|value |
+----------------------------+
|{4 -> {5191, 57, -46, POS2}}|
+----------------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.