简体   繁体   English

如何在Spark Rdd中转换Seq

[英]How can i convert a Seq in a Spark Rdd

I'm using Spark Scala and Play Framework I have a seq like this 我正在使用Spark Scala和Play Framework我有这样的seq

//a sequence of Book objects
val books:[Seq[Book]]

that i fill with the format method from a json file: 我用json文件中的format方法填充:

implicit val bookFormat: Format[Libri] = {
   ((JsPath \ "City").format[String] and
    (JsPath \ "GEN").format[Int] and
    (JsPath \ "SER").format[Int]    
    ) (Libri.apply , unlift(Libri.unapply)) }

val books = Json.parse(JsonString).as[Seq[Libri]]

How can i convert this seq in a Spark RDD. 如何在Spark RDD中转换此seq。 (I want to use this rdd for make some query...so i need the "registerTempTable" and "rdd.sqlContext.sql" (我想用这个rdd进行一些查询...所以我需要“registerTempTable”和“rdd.sqlContext.sql”

You can use sparkContext.parallelize(books) . 您可以使用sparkContext.parallelize(books) parallelize takes a collection and splits it into RDD. parallelize采用集合并将其拆分为RDD。 You can pass an additional parameter to define the number of partitions into which this seq will be split. 您可以传递一个附加参数来定义此seq将被拆分的分区数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM