[英]What is the correct way to get a Hadoop FileSystem object that can be used for reading from/writing to HDFS?
What is the correct way to create a FileSystem object that can be used for reading from/writing to HDFS? 创建可用于读取/写入HDFS的FileSystem对象的正确方法是什么? In some examples I've found, they do something like this:
在我发现的一些示例中,它们执行以下操作:
final Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
final FileSystem fs = FileSystem.get(conf);
From looking at the documentation for the Configuration class, it looks like the properties from core-site.xml are automatically loaded when the object is created if that file is on the classpath, so there is no need to set it again. 通过查看Configuration类的文档,如果该对象位于类路径中,则创建对象时似乎会自动加载core-site.xml中的属性,因此无需再次进行设置。
I haven't found anything that says why adding hdfs-site.xml would be required, and it seems to work fine without it. 我还没有发现任何说明为什么需要添加hdfs-site.xml的信息,没有它似乎可以正常工作。
Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? 将core-site.xml放在类路径上并跳过hdfs-site.xml是安全的,还是应该像在示例中看到的那样设置两者? In what cases would the properties from hdfs-site.xml be required?
在什么情况下需要hdfs-site.xml的属性?
FileSystem
needs only one configuration key to successfully connect to HDFS. FileSystem
仅需一个配置密钥即可成功连接到HDFS。 Previously it was fs.default.name
. 以前是
fs.default.name
。 From yarn
onward it's changed to fs.defaultFS
. 从
yarn
向前更改为fs.defaultFS
。 So the following snippet is sufficient for the connection. 因此,下面的代码段足以用于连接。
Configuration conf = new Configuration();
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"
FileSystem fs = FileSystem.get(conf);
Tip : Check the core-site.xml
which key exists. 提示:检查
core-site.xml
哪个密钥存在。 Set the same value associated with it in conf
. 在
conf
设置与其关联的相同值。 If the machine from where you are running the code doesn't have the host name mapping, put the its IP. 如果运行代码的计算机没有主机名映射,请输入其IP。 In
mapR
cluster value will have prefix like maprfs://
. 在
mapR
群集值将具有诸如maprfs://
类的前缀。
For the question: 对于这个问题:
Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples?
将core-site.xml放在类路径上并跳过hdfs-site.xml是安全的,还是应该像在示例中看到的那样设置两者? In what cases would the properties from hdfs-site.xml be required?
在什么情况下需要hdfs-site.xml的属性?
I do an experiment: if you are using CDH (Cloudera's Distribution Including Apache Hadoop, my version is Hadoop 2.6.0-cdh5.11.1), it's not safe to use core-site.xml only.It will throw Exception: 我做一个实验:如果您使用的是CDH (Cloudera的发行版包括Apache Hadoop,我的版本是Hadoop 2.6.0-cdh5.11.1),仅使用core-site.xml是不安全的,这将引发异常:
Request processing failed; nested exception is java.lang.IllegalArgumentException: java.net.UnknownHostException
And if you add hdfs-site.xml , it worked. 而且,如果添加hdfs-site.xml ,它会起作用。
Here's a block of code from one of my projects for building a Configuration
usable for HBase, HDFS and map-reduce. 下面是我的一个项目为建设一个代码块
Configuration
可用于HBase的,HDFS和地图降低。 Notice that addResource
will search the active classpath for the resources entries you name. 请注意,
addResource
将在活动类路径中搜索您命名的资源条目。
HBaseConfiguration.addHbaseResources(config);
config.addResource("mapred-default.xml");
config.addResource("mapred-site.xml");
My classpath definitely includes the directories housing core-site.xml
, hdfs-site.xml
, mapred-site.xml
, and hbase-site.xml
. 我的类路径肯定包含目录
core-site.xml
, hdfs-site.xml
, mapred-site.xml
和hbase-site.xml
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.