简体繁体 English

从群集的所有计算机访问HDFS文件

[英]Access to HDFS files from all computers of a cluster

原文 2013-02-13 15:06:30 4 1 java/ linux/ hadoop/ mapreduce/ hdfs

My hadoop the program originally was launched in a local mode, and now my purpose became start in completely distributed mode. 我的程序最初是在本地模式下启动的，而现在我的目的是在完全分布式模式下启动的。 For this purpose it is necessary to provide access to the files which reading is executed in the reducer and mapper functions, from all computers of a cluster and therefore I asked a question on http://answers.mapr.com/questions/4444/syntax-of-option-files-in-hadoop-script (also as it will be not known on what computer to be executed the mapper function (mapper from logic of the program there will be only one and the program will be launched only with one mapper), it is necessary to provide also access on all cluster to the file arriving on an input of the mapper function). 为此，有必要从集群的所有计算机访问在reducer和mapper功能中执行读取的文件，因此我在http://answers.mapr.com/questions/4444/上提出了问题。 Hadoop脚本中的选项文件语法（也因为在哪个计算机上执行映射器功能尚不知道（从程序逻辑映射器将只有一个，并且仅使用以下命令启动程序）一个映射器），则还必须在所有群集上提供对通过映射器功能输入到达的文件的访问。 In this regard I had a question: Whether it is possible to use hdfs-files directly: that is to copy beforehand files from file system of Linux in file system of HDFS (thereby as I assume, these files become available on all computers of a cluster if it not so, correct please) and then to use HDFS Java API for reading these files, in the reducer and mapper functions which are executing on computers of a cluster? 在这方面，我有一个问题：是否可以直接使用hdfs-files：即从HDFS的文件系统中的Linux文件系统中预先复制文件（因此，我假设这些文件在Windows的所有计算机上都可用）。群集（如果不是这样，请更正），然后在群集计算机上执行的reducer和mapper函数中使用HDFS Java API读取这些文件？

If on this question the response the positive, give please a copying example from file system of Linux in file system of HDFS and reading these files in java to the program by means of HDFS Java API and and record of its contents at java-string. 如果对该问题的回答是肯定的，请举一个从HDFS文件系统中的Linux文件系统复制实例，并通过HDFS Java API将Java中的这些文件读取到程序中，并将其内容记录在java字符串中。

1 个解决方案

Copy all your input files to the master node (this can be done using scp ). 将所有输入文件复制到主节点（可以使用scp来完成）。 Then login to your master node ( ssh ) and execute something like following to copy files from local filesystem to hdfs: 然后登录到您的主节点（ ssh ）并执行以下操作，将文件从本地文件系统复制到hdfs：

hadoop fs -put $localfilelocation $destination

Now in your hadoop jobs, you may use the input to be hdfs:///$destination . 现在，在您的hadoop作业中，您可以使用输入为hdfs:///$destination 。 No need to use any extra API to read from HDFS. 无需使用任何额外的API即可读取HDFS。

If you really want to read files from HDFS and use as addiotional information other than the input files, then refer this . 如果你真的想从HDFS读取文件，并使用比输入其他文件作为addiotional信息，届时提及本。