简体   繁体   中英

What is the correct way to get a Hadoop FileSystem object that can be used for reading from/writing to HDFS?

What is the correct way to create a FileSystem object that can be used for reading from/writing to HDFS? In some examples I've found, they do something like this:

final Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));

final FileSystem fs = FileSystem.get(conf);

From looking at the documentation for the Configuration class, it looks like the properties from core-site.xml are automatically loaded when the object is created if that file is on the classpath, so there is no need to set it again.

I haven't found anything that says why adding hdfs-site.xml would be required, and it seems to work fine without it.

Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? In what cases would the properties from hdfs-site.xml be required?

FileSystem needs only one configuration key to successfully connect to HDFS. Previously it was fs.default.name . From yarn onward it's changed to fs.defaultFS . So the following snippet is sufficient for the connection.

Configuration conf = new Configuration();
conf.set(key, "hdfs://host:port");  // where key="fs.default.name"|"fs.defaultFS"

FileSystem fs = FileSystem.get(conf);       

Tip : Check the core-site.xml which key exists. Set the same value associated with it in conf . If the machine from where you are running the code doesn't have the host name mapping, put the its IP. In mapR cluster value will have prefix like maprfs:// .

For the question:

Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? In what cases would the properties from hdfs-site.xml be required?

I do an experiment: if you are using CDH (Cloudera's Distribution Including Apache Hadoop, my version is Hadoop 2.6.0-cdh5.11.1), it's not safe to use core-site.xml only.It will throw Exception:

Request processing failed; nested exception is java.lang.IllegalArgumentException: java.net.UnknownHostException

And if you add hdfs-site.xml , it worked.

Here's a block of code from one of my projects for building a Configuration usable for HBase, HDFS and map-reduce. Notice that addResource will search the active classpath for the resources entries you name.

HBaseConfiguration.addHbaseResources(config);
config.addResource("mapred-default.xml");
config.addResource("mapred-site.xml");

My classpath definitely includes the directories housing core-site.xml , hdfs-site.xml , mapred-site.xml , and hbase-site.xml .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM