简体   繁体   中英

How to read HDFS file from Scala code

I am new to Scala and HDFS:

I am just wondering I am able to read local file from Scala code but how to read from HDFS:

import scala.io.source
object ReadLine {
  def main(args:Array[String]) {
    if (args.length>0) {
      for (line <- Source.fromLine(args(0)).getLine())
        println(line)
      }
    }

in Argument I have passed hdfs://localhost:9000/usr/local/log_data/file1.. But its giving FileNotFoundException error I am definitely missing something.. can anyone help me out here ?

scala.io.source api cannot read from HDFS . Source is used to read from local file system.

Spark

If you want to read from hdfs then I would recommend to use spark where you would have to use sparkContext .

val lines = sc.textFile(args(0))  //args(0) should be hdfs:///usr/local/log_data/file1

No Spark

If you don't want to use spark then you should go with BufferedReader or StreamReader or hadoop filesystem api . for example

val hdfs = FileSystem.get(new URI("hdfs://yourUrl:port/"), new Configuration()) 
val path = new Path("/path/to/file/")
val stream = hdfs.open(path)
def readLines = Stream.cons(stream.readLine, Stream.continually( stream.readLine))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM