[英]how to access map values and keys stored in a data frame in scala spark
i have a table which description is as follows: 我有一张桌子,说明如下:
# col_name data_type comment
id string
persona_model map<string,struct<score:double,tag:string>>
# Partition Information
# col_name data_type comment
process_date string
sample row would be something like this(tab separated): 示例行将如下所示(制表符分隔):
000000E91010441BB122402A45D439E7 {"Tech":{"score":0.21678,"tag":"OTHERS"}} 2018-05-16-01
Now I want to form another table with only 2 columns id
and its respective score
in it. 现在,我想形成另一个只有2列id
及其相应score
的表。
How can i do it in scala spark? 我如何在Scala Spark中做到这一点?
Moreover, whats really bugging me is how can I access only a particular score
and how can I store it in an integer variable lets say temp
? 此外,真正困扰我的是如何仅访问特定score
以及如何将其存储在整数变量中,比如temp
?
You can do this: 你可以这样做:
val newDF = oldDF.select(col("id"), col("persona_model")("Tech")("score").as("temp"))
then you can extract temp values easily. 那么您可以轻松提取温度值。
update : if you have more than one Key then the procedure is a little more complex. 更新 :如果您有多个密钥,那么过程会稍微复杂一些。
first create a class for the struct (necesary for type cast): 首先为该结构创建一个类(类型转换必需):
case class Score(score: Double, tag: String)
then extract all the keys from the data: 然后从数据中提取所有键:
val keys = oldDF.rdd
.flatMap(r => r.getMap(1).asInstanceOf[Map[String, Score]].toList)
.collect.map(_._1).distinct.toList
finally you can extract all names like this: 最后,您可以提取所有名称,如下所示:
def condition(keys: List[String]): Column = {
keys match {
case k::ks => when(col("persona_model")(k)("score").isNotNull, col("persona_model")(k)("score")).otherwise(condition(ks))
case nil => lit(null)
}
}
val newDF = oldDF.select(col("id"), condition(keys))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.