簡體   English   中英

從數組數組的RDD到數據幀

[英]From RDD of array of arrays into dataframe

在做了一些操作之后,我得到了一個rdd(如下一個)數組(any),其中所有的值都是Int expect 3,8和13類型的字符串。

Array[Array[Any]] = Array(Array(1, 2, 3, 4, 5), Array(6, 7, 8, 9, 10), Array(11, 12, 13, 14, 15))

使用以下代碼供您參考:

var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))

現在我嘗試使用case類創建一個數據框,其中列名和case類如下:

case class specialchar(alpha:Int,beta:Int,gamma:String,theta:Int,zeta:Int) 

我需要幫助我們如何迭代數組[Array [Any]]的rdd並存儲在數據幀中。 提前致謝。

Udf處理Any

def toInt(x: Any): Option[Int] = x match {
  case i: Int => Some(i)
  case _ => None
}

def toStr(x: Any): Option[String] = x match {
  case i: String => Some(i)
  case _ => None
}

案例類和將Array轉換為Df。

var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))
case class specialchar(alpha:Int,beta:Int,gamma:String,theta:Int,zeta:Int)  

var specialCharDf = Seq.empty[specialchar].toDF

exp.collect().foreach(x => {
    val a:Int = toInt(x(0)).getOrElse(1)
    val b:Int = toInt(x(1)).getOrElse(1)
    val c:String = toStr(x(2)).getOrElse("1")
    val d:Int = toInt(x(3)).getOrElse(1)
    val e:Int = toInt(x(4)).getOrElse(1)

    println(a, b, c, d, e)

    val specialcharTempDf =  Seq(specialchar(a,b,c,d,e)).toDF
    specialCharDf = specialcharTempDf.union(specialCharDf)
})

specialCharDf.printSchema() //follows schema desired.

EDIT EDIT EDIT - akhil提到最后,他們都應該是整數。 新解決方案如下:

    var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))
    case class specialchar(alpha:Int,beta:Int,gamma:Int,theta:Int,zeta:Int)  

    var specialCharDf = Seq.empty[specialchar].toDF

exp.collect().foreach(x => {
    val a:Int = toInt(x(0)).getOrElse(1)
    val b:Int = toInt(x(1)).getOrElse(1)
    val c:String = toStr(x(2)).getOrElse("1")
    val f = c.toInt
    val d:Int = toInt(x(3)).getOrElse(1)
    val e:Int = toInt(x(4)).getOrElse(1)

    println(a, b, f, d, e)

    val specialcharTempDf =  Seq(specialchar(a,b,f,d,e)).toDF
    specialCharDf = specialcharTempDf.union(specialCharDf)
})

specialCharDf.printSchema() //follows schema desired.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM