What is the correct way to create a FileSystem object that can be used for reading from/writing to HDFS? In some examples I've found, they do something like this:
final Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
final FileSystem fs = FileSystem.get(conf);
From looking at the documentation for the Configuration class, it looks like the properties from core-site.xml are automatically loaded when the object is created if that file is on the classpath, so there is no need to set it again.
I haven't found anything that says why adding hdfs-site.xml would be required, and it seems to work fine without it.
Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? In what cases would the properties from hdfs-site.xml be required?
FileSystem
needs only one configuration key to successfully connect to HDFS. Previously it was fs.default.name
. From yarn
onward it's changed to fs.defaultFS
. So the following snippet is sufficient for the connection.
Configuration conf = new Configuration();
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"
FileSystem fs = FileSystem.get(conf);
Tip : Check the core-site.xml
which key exists. Set the same value associated with it in conf
. If the machine from where you are running the code doesn't have the host name mapping, put the its IP. In mapR
cluster value will have prefix like maprfs://
.
For the question:
Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? In what cases would the properties from hdfs-site.xml be required?
I do an experiment: if you are using CDH (Cloudera's Distribution Including Apache Hadoop, my version is Hadoop 2.6.0-cdh5.11.1), it's not safe to use core-site.xml only.It will throw Exception:
Request processing failed; nested exception is java.lang.IllegalArgumentException: java.net.UnknownHostException
And if you add hdfs-site.xml , it worked.
Here's a block of code from one of my projects for building a Configuration
usable for HBase, HDFS and map-reduce. Notice that addResource
will search the active classpath for the resources entries you name.
HBaseConfiguration.addHbaseResources(config);
config.addResource("mapred-default.xml");
config.addResource("mapred-site.xml");
My classpath definitely includes the directories housing core-site.xml
, hdfs-site.xml
, mapred-site.xml
, and hbase-site.xml
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.