简体   繁体   中英

How to write to HDFS programmatically?

So after 36 hours of experimenting with this and that, I have finally managed to get a cluster up and running but now I am confused how I can write files to it using Java? A tutorial said this program should be used but I don't understand it at all and it doesn't work as well.

public class FileWriteToHDFS {

public static void main(String[] args) throws Exception {

    //Source file in the local file system
    String localSrc = args[0];
    //Destination file in HDFS
    String dst = args[1];

    //Input stream for the file in local file system to be written to HDFS
    InputStream in = new BufferedInputStream(new FileInputStream(localSrc));

    //Get configuration of Hadoop system
    Configuration conf = new Configuration();
    System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));

    //Destination file in HDFS
    FileSystem fs = FileSystem.get(URI.create(dst), conf);
    OutputStream out = fs.create(new Path(dst));

    //Copy file from local to HDFS
    IOUtils.copyBytes(in, out, 4096, true);

    System.out.println(dst + " copied to HDFS");

    }
}

My confusion is how does this piece of code identify specifics of my cluster? How will it know where the masternode is and where the slavenodes are?

Furthermore when I run this code and provide some local file in source and leave destination blank/or provide a file name only the program writes the file back to my local storage and not the location that I defined as storage space for my namenodes and datanodes. Should I be providing this path manually? How does this work? Please suggest some blog that can help me understand it better or can get working with a smallest example.

First off, you'll need to add some Hadoop libraries to your classpath. Without those, no, that code won't work.

How will it know where the masternode is and where the slavenodes are?

From the new Configuration(); and subsequent conf.get("fs.defaultFS") .

It reads the core-site.xml of the HADOOP_CONF_DIR environment variable and returns the address of the namenode. The client only needs to talk to the namenode to receive the locations of the datanodes, from which file blocks are written to

the program writes the file back to my local storage

It's not clear where you've configured the filesystem, but the default is file:// , your local disk. You change this in the core-site.xml. If you follow the Hadoop documentation, the pseudo distributed cluster setup mentions this

It's also not very clear why you need your own Java code when simply hdfs dfs -put will do the same thing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM