Spark-submit cannot access local file system

Question

Really simple Scala code files at the first count() method call.

def main(args: Array[String]) {
    // create Spark context with Spark configuration
    val sc = new SparkContext(new SparkConf().setAppName("Spark File Count"))
    val fileList = recursiveListFiles(new File("C:/data")).filter(_.isFile).map(file => file.getName())
    val filesRDD = sc.parallelize(fileList)
    val linesRDD = sc.textFile("file:///temp/dataset.txt")
    val lines = linesRDD.count()
    val files = filesRDD.count()
  }

I don't want to set up a HDFS installation for this right now. How do I configure Spark to use the local file system? This works with spark-shell .

Answer 1

To read the file from local filesystem(From Windows directory) you need to use below pattern.

val fileRDD = sc.textFile("C:\\Users\\Sandeep\\Documents\\test\\test.txt");

Please see below sample working program to read data from local file system.

package com.scala.example
import org.apache.spark._

object Test extends Serializable {
  val conf = new SparkConf().setAppName("read local file")
  conf.set("spark.executor.memory", "100M")
  conf.setMaster("local");

  val sc = new SparkContext(conf)
  val input = "C:\\Users\\Sandeep\\Documents\\test\\test.txt"

  def main(args: Array[String]): Unit = {
    val fileRDD = sc.textFile(input);
    val counts = fileRDD.flatMap(line => line.split(","))
      .map(word => (word, 1))
      .reduceByKey(_ + _)

    counts.collect().foreach(println)
    //Stop the Spark context
    sc.stop

  }
}

Answer 2

val sc = new SparkContext(new SparkConf().setAppName("Spark File Count")).setMaster("local[8]")

might help

Spark-submit cannot access local file system

Question

2 answers

solution1
1 ACCPTED 2016-12-18 05:32:48

solution2
0 2016-12-16 05:26:10

Spark-submit cannot access local file system

Question

2 answers

solution1 1 ACCPTED 2016-12-18 05:32:48

solution2 0 2016-12-16 05:26:10

solution1
1 ACCPTED 2016-12-18 05:32:48

solution2
0 2016-12-16 05:26:10