简体   繁体   English

什么是Hadoop上的HDFS位置?

[英]What is the HDFS Location on Hadoop?

I am trying to run the WordCount example in Hadoop after following some online tutorials. 我正在尝试在遵循一些在线教程后在Hadoop中运行WordCount示例。 However what's not clear to me as where does the file get copied from our local file system to HDFS when we execute the following command. 但是,当我们执行以下命令时,我不清楚文件从本地文件系统复制到HDFS的位置。

hadoop fs -copyFromLocal /host/tut/python-tutorial.pdf /usr/local/myhadoop-tmp/

When I executed the following command, I dont see my python-tutorial.pdf listed here on HDFS. 当我执行以下命令时,我没有在HDFS上看到我的python-tutorial.pdf。

hadoop fs -ls

This is confusing me. 这让我很困惑。 I have already specified "myhadoop-tmp" directory in core-site.xml. 我已经在core-site.xml中指定了“myhadoop-tmp”目录。 I thought this directory will become HDFS directory for storing all the input files. 我以为这个目录将成为存储所有输入文件的HDFS目录。

core-site.xml
=============
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/myhadoop-tmp</value>
    <description>A base for other temporary directories.</description>
</property>

If this is not the case where is the HDFS located on my machine ? 如果不是HDFS位于我的机器上的情况? What configuration determines the HDFS directory and where does the input file go when we copy it from local file system to HDFS ? 什么配置决定HDFS目录以及输入文件从本地文件系统复制到HDFS时的位置?

This is set in the dfs.datanode.data.dir property, which defaults to file://${hadoop.tmp.dir}/dfs/data (see details here ). 这在dfs.datanode.data.dir属性中设置,该属性默认为file://${hadoop.tmp.dir}/dfs/data (请参阅此处的详细信息)。

However, in your case, the problem is that you are not using the full path within HDFS. 但是,在您的情况下,问题是您没有 HDFS中使用完整路径。 Instead, do: 相反,做:

hadoop fs -ls /usr/local/myhadoop-tmp/

Note that, you also seem to be confusing the path within HDFS to the path in your local file system. 请注意,您似乎也将HDFS中的路径与本地文件系统中的路径混淆。 Within HDFS, your file is in /usr/local/myhadoop-tmp/ . 在HDFS中,您的文件位于/usr/local/myhadoop-tmp/ In your local system (and given your configuration setting), it is under /usr/local/myhadoop-tmp/dfs/data/ ; 在本地系统中(并给出您的配置设置),它位于/usr/local/myhadoop-tmp/dfs/data/ ; in there, there's a directory structure and naming convention defined by HDFS, that is independent to whatever path in HDFS you decide to use. 在那里,有一个由HDFS定义的目录结构和命名约定,它独立于您决定使用的HDFS中的任何路径。 Also, it won't have the same name, since it is divided into blocks and each block is assigned a unique ID; 此外,它不会具有相同的名称,因为它被分成块,每个块被分配一个唯一的ID; the name of a block is something like blk_1073741826 . 块的名称类似于blk_1073741826

To conclude: the local path used by the datanode is NOT the same as the paths you use in HDFS. 总结一下:datanode使用的本地路径与您在HDFS中使用的路径不同。 You can go into your local directory looking for files, but you should not do this, since you could mess up the HDFS metadata management. 您可以进入本地目录查找文件,但不应该这样做,因为您可能会搞乱HDFS元数据管理。 Just use the hadoop command-line tools to copy/move/read files within HDFS, using any logical path (in HDFS) that you wish to use. 只需使用hadoop命令行工具,使用您希望使用的任何逻辑路径(在HDFS中)复制/移动/读取HDFS中的文件。 These paths within HDFS do not need to be tied to the paths you used in for your local datanode storage (there is no reason to or advantage of doing this). HDFS中的这些路径不需要与您在本地数据节点存储中使用的路径相关联(没有理由或优势这样做)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM