简体繁体 English

Hadoop如何工作？客户端如何连接到hadoop

[英]How hadoop works? How a client is connected to hadoop

原文 2014-11-07 05:22:07 4 2 java/ apache/ hadoop/ hdfs

I have basic understanding of hadoop. 我对hadoop有基本的了解。 My question is regarding how a client/developer is connected to hadoop cluster to perform queries 我的问题是关于客户端/开发人员如何连接到hadoop集群以执行查询

For example, I am a hadoop developer. 例如，我是hadoop开发人员。 Hadoop cluster in some remote location. Hadoop集群位于某个远程位置。 How am I connected to the hadoop cluster to run my java code? 我如何连接到hadoop集群以运行我的Java代码？ Do I have to install hadoop in my laptop also (for which I have to run Linux)? 我是否还必须在笔记本电脑中安装hadoop（必须运行Linux）？

or, is it OK if I am in the same network as of the Hadoop cluster and simply mount the share in my laptop and put my code into hadoop cluster? 或者，如果我与Hadoop群集位于同一网络中，然后将共享安装在笔记本电脑中，然后将代码放入hadoop群集中，可以吗？

Second question: For running java code, do I have to SSH to any data node and then run the job? 第二个问题：对于运行Java代码，我是否必须通过SSH到任何数据节点然后运行作业？

The above two questions are haunting me. 以上两个问题困扰着我。 I don't have real time experience. 我没有实时经验。

Thank you in advance! 先感谢您！

2 个解决方案

To open a file, a client contacts the NameNode and retrieves a list of locations for the blocks that comprise the file. 要打开文件，客户端联系NameNode并检索组成文件的块的位置列表。 These locations identify the DataNodes which hold each block. 这些位置标识了保存每个块的DataNode。 Clients then read file data directly from the DataNode servers, possibly in parallel. 然后，客户端可以直接从DataNode服务器读取文件数据（可能是并行读取）。 The NameNode is not directly involved in this bulk data transfer, keeping its overhead to a minimum. NameNode不直接参与此批量数据传输，从而将其开销降至最低。

I think you don't have proper knowledge of hadoop cluster, follow this link you will be fully understand about cluster of hadoop 我认为您没有适当的hadoop集群知识，请点击此链接，您将完全了解hadoop集群

http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/ http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

As far as i know having hadoop installed in your laptop is not necessary to run your job in some hadoop cluster.you just have to get the remote access to the job tracker and submit the job. 据我所知，在笔记本电脑中安装hadoop并不一定要在某些hadoop集群中运行作业。您只需要远程访问作业跟踪器并提交作业即可。

For the second point "is it OK if I am in the same network as of the Hadoop cluster and simply mount the share in my laptop and put my code into hadooop cluster?" 对于第二点，“如果我与Hadoop群集位于同一网络中，然后将共享安装在笔记本电脑中，然后将代码放入hadooop群集中，可以吗？”

putting your code in the hadoop cluster must be through the right channels ie through master node. 将代码放入hadoop集群中必须通过正确的渠道，即通过主节点。 In hadoop you have to submit your data and code to master node and its his duty to distribute it to cluster. 在hadoop中，您必须将数据和代码提交给主节点，并负责将其分发到集群。
For running java code, do I have to SSH to any data node and then run the job? 对于运行Java代码，我是否必须通过SSH到任何数据节点然后运行作业？ ==> You will have to ssh to the jobtracker not the datanode. ==>您将必须ssh到jobtracker而不是datanode。 Datanodes are the slaves for storing data. 数据节点是用于存储数据的从设备。 Jobtracker is master for alloting jobs in cluster. Jobtracker是用于在集群中分配作业的主机。