简体   繁体   中英

Source.fromFile not working for HDFS file path

i am trying to read file contents from my hdfs for that i am using Source.fromFile() . It is working fine when my file is in local system but throwing error when i am trying to read file from HDFS.

object CheckFile{
    def main(args:Array[String]) {
        for (line <- Source.fromFile("/user/cloudera/xxxx/File").getLines()) {
            println(line)
        }
    }
}

Error:

java.io.FileNotFoundException: hdfs:/quickstart.cloudera:8080/user/cloudera/xxxx/File (No such file or directory)

i searched but i am not able to find any solutions to this.

Please help

If you are using Spark you should use SparkContext to load the files. Source.fromFile uses the local file system.

Say you have your SparkContext at sc ,

val fromFile = sc.textFile("hdfs://path/to/file.txt")

Should do the trick. You might have to specify the node address, though.

UPDATE:

To add to the comment. You want to read some data from hdfs and store it as a Scala collection. This is bad practice as the file might contain milions of lines and it will crash due to insufficient amount of memory; you should use RDDs and not built-in Scala collections. Nevertheless, if this is what you want, you could do:

val fromFile = sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray

Which would produce a local collection of desired type ( Array in this case).

sc.textFile("hdfs://path/to/file.txt").toLocalIterator.toArray.mkString将结果作为字符串

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM