简体   繁体   English

如何在Scala Spark中访问存储在数据框中的映射值和键

[英]how to access map values and keys stored in a data frame in scala spark

i have a table which description is as follows: 我有一张桌子,说明如下:

# col_name              data_type               comment             

id                      string                                      
persona_model           map<string,struct<score:double,tag:string>>                     

# Partition Information      
# col_name              data_type               comment             

process_date            string          

sample row would be something like this(tab separated): 示例行将如下所示(制表符分隔):

000000E91010441BB122402A45D439E7        {"Tech":{"score":0.21678,"tag":"OTHERS"}}    2018-05-16-01              

Now I want to form another table with only 2 columns id and its respective score in it. 现在,我想形成另一个只有2列id及其相应score的表。
How can i do it in scala spark? 我如何在Scala Spark中做到这一点?

Moreover, whats really bugging me is how can I access only a particular score and how can I store it in an integer variable lets say temp ? 此外,真正困扰我的是如何仅访问特定score以及如何将其存储在整数变量中,比如temp

You can do this: 你可以这样做:

val newDF = oldDF.select(col("id"), col("persona_model")("Tech")("score").as("temp"))

then you can extract temp values easily. 那么您可以轻松提取温度值。

update : if you have more than one Key then the procedure is a little more complex. 更新 :如果您有多个密钥,那么过程会稍微复杂一些。

first create a class for the struct (necesary for type cast): 首先为该结构创建一个类(类型转换必需):

case class Score(score: Double, tag: String)

then extract all the keys from the data: 然后从数据中提取所有键:

val keys = oldDF.rdd
    .flatMap(r => r.getMap(1).asInstanceOf[Map[String, Score]].toList)
    .collect.map(_._1).distinct.toList

finally you can extract all names like this: 最后,您可以提取所有名称,如下所示:

def condition(keys: List[String]): Column = {
     keys match {
        case k::ks => when(col("persona_model")(k)("score").isNotNull, col("persona_model")(k)("score")).otherwise(condition(ks))
        case nil  => lit(null)
     }
 }

val newDF = oldDF.select(col("id"), condition(keys))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM