简体   繁体   中英

create a dataset with data frame from sequence of tuples with out using case class

I have sequence of tuples through which I made RDD and converted that to dataframe. like below.

 val rdd = sc.parallelize(Seq((1, "User1"), (2, "user2"), (3, "user3")))
import spark.implicits._ 
val df = rdd.toDF("Id", "firstname")

now i want to create a dataset from df. How can I do that ?

simply df.as[(Int, String)] is what you need to do. pls see full example here.

package com.examples

import org.apache.log4j.Level

import org.apache.spark.sql.{Dataset, SparkSession}

object SeqTuplesToDataSet {
  org.apache.log4j.Logger.getLogger("org").setLevel(Level.ERROR)
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().appName(this.getClass.getName).config("spark.master", "local").getOrCreate()
    spark.sparkContext.setLogLevel("ERROR")
    val rdd = spark.sparkContext.parallelize(Seq((1, "User1"), (2, "user2"), (3, "user3")))
    import spark.implicits._
    val df = rdd.toDF("Id", "firstname")
    val myds: Dataset[(Int, String)] = df.as[(Int, String)]
    myds.show()
  }
}

Result :

+---+---------+
| Id|firstname|
+---+---------+
|  1|    User1|
|  2|    user2|
|  3|    user3|
+---+---------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM