简体   繁体   中英

How hadoop works? How a client is connected to hadoop

I have basic understanding of hadoop. My question is regarding how a client/developer is connected to hadoop cluster to perform queries

For example, I am a hadoop developer. Hadoop cluster in some remote location. How am I connected to the hadoop cluster to run my java code? Do I have to install hadoop in my laptop also (for which I have to run Linux)?

or, is it OK if I am in the same network as of the Hadoop cluster and simply mount the share in my laptop and put my code into hadoop cluster?

Second question: For running java code, do I have to SSH to any data node and then run the job?

The above two questions are haunting me. I don't have real time experience.

Thank you in advance!

To open a file, a client contacts the NameNode and retrieves a list of locations for the blocks that comprise the file. These locations identify the DataNodes which hold each block. Clients then read file data directly from the DataNode servers, possibly in parallel. The NameNode is not directly involved in this bulk data transfer, keeping its overhead to a minimum.

I think you don't have proper knowledge of hadoop cluster, follow this link you will be fully understand about cluster of hadoop

http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

As far as i know having hadoop installed in your laptop is not necessary to run your job in some hadoop cluster.you just have to get the remote access to the job tracker and submit the job.

For the second point "is it OK if I am in the same network as of the Hadoop cluster and simply mount the share in my laptop and put my code into hadooop cluster?"

  • putting your code in the hadoop cluster must be through the right channels ie through master node. In hadoop you have to submit your data and code to master node and its his duty to distribute it to cluster.

  • For running java code, do I have to SSH to any data node and then run the job? ==> You will have to ssh to the jobtracker not the datanode. Datanodes are the slaves for storing data. Jobtracker is master for alloting jobs in cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM