简体   繁体   English

将RDD [(String,Map [String,Int])]展平为RDD [String,String,Int]

[英]Flatten RDD[(String,Map[String,Int])] to RDD[String,String,Int]

I am trying to flatten an RDD[(String,Map[String,Int])] to RDD[String,String,Int] and ultimately save it as a dataframe. 我试图将RDD [(String,Map [String,Int])]展平为RDD [String,String,Int],最终将其保存为数据框。

    val rdd=hashedContent.map(f=>(f._1,f._2.flatMap(x=> (x._1, x._2))))
    val rdd=hashedContent.map(f=>(f._1,f._2.flatMap(x=>x)))

All having type mismatch errors. 都具有类型不匹配错误。 Any help on how to flatten structures like this one? 对如何扁平化这样的结构有帮助吗? EDIT: 编辑:

    hashedContent -- ("A", Map("acs"->2, "sdv"->2, "sfd"->1)),
                     ("B", Map("ass"->2, "fvv"->2, "ffd"->1)),
                      ("c", Map("dg"->2, "vd"->2, "dgr"->1))

You were close: 您接近:

rdd.flatMap(x => x._2.map(y => (x._1, y._1, y._2)))
   .toDF()
   .show()
+---+---+---+
| _1| _2| _3|
+---+---+---+
|  A|acs|  2|
|  A|sdv|  2|
|  A|sfd|  1|
|  B|ass|  2|
|  B|fvv|  2|
|  B|ffd|  1|
|  c| dg|  2|
|  c| vd|  2|
|  c|dgr|  1|
+---+---+---+

Data 数据

val data = Seq(("A", Map("acs"->2, "sdv"->2, "sfd"->1)),
               ("B", Map("ass"->2, "fvv"->2, "ffd"->1)),
               ("c", Map("dg"->2, "vd"->2, "dgr"->1)))

val rdd = sc.parallelize(data)

For completeness: an alternative solution (which might be considered more readable) would be to first convert the RDD into a DataFrame , and then to transform its structure using explode : 为了完整DataFrame :一种替代解决方案(可能被认为更具可读性)将是先将RDD转换为DataFrame ,然后使用explode转换其结构:

import org.apache.spark.sql.functions._
import spark.implicits._

rdd.toDF("c1", "map")
  .select($"c1", explode($"map"))
  .show(false)

// same result:
// +---+---+-----+
// |c1 |key|value|
// +---+---+-----+
// |A  |acs|2    |
// |A  |sdv|2    |
// |A  |sfd|1    |
// |B  |ass|2    |
// |B  |fvv|2    |
// |B  |ffd|1    |
// |c  |dg |2    |
// |c  |vd |2    |
// |c  |dgr|1    |
// +---+---+-----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM