SCALA : Read the text file and create tuple of it

How to create a tuple from the below-existing RDD?

// reading a text file "b.txt" and creating RDD 
val rdd = sc.textFile("/home/training/desktop/b.txt") 

b.txt dataset -->


If you are intending to have Array[Tuples4] then you can do the following

scala> val rdd = sc.textFile("file:/home/training/desktop/b.txt")
rdd: org.apache.spark.rdd.RDD[String] = file:/home/training/desktop/b.txt MapPartitionsRDD[5] at textFile at <console>:24

scala> val arrayTuples = rdd.map(line => line.split(",")).map(array => (array(0), array(1), array(2), array(3))).collect
arrayTuples: Array[(String, String, String, String)] = Array((" Ankita",26,BigData,newbie), (" Shikha",30,Management,Expert))

Then you can access each fields as tuples

scala> arrayTuples.map(x => println(x._3))
res4: Array[Unit] = Array((), ())


If you have variable sized input file as


you can write match case pattern matching as

scala> val arrayTuples = rdd.map(line => line.split(",") match {
     | case Array(a, b, c, d) => (a,b,c,d)
     | case Array(a,b,c) => (a,b,c)
     | }).collect
arrayTuples: Array[Product with Serializable] = Array((Ankita,26,BigData,newbie), (Shikha,30,Management,Expert), (Anita,26,big))

Updated again

As @eliasah pointed that above procedure is a bad practice which is using product iterator . As his suggestion we should know the maximum elements of the input data and use following logic where we assign default values for no elements

val arrayTuples = rdd.map(line => line.split(",")).map(array => (Try(array(0)) getOrElse("Empty"), Try(array(1)) getOrElse(0), Try(array(2)) getOrElse("Empty"), Try(array(3)) getOrElse("Empty"))).collect

And as @philantrovert pointed out, we can verify the output in the following way, if we are not using REPL


which results to


