简体   繁体   English

访问远程集群中的HDFS

[英]Access HDFS in Remote Cluster

Currently, I have a remote Hadoop cluster. 目前,我有一个远程Hadoop集群。 When I try to access data in datanode through namenode, the namenode will redirect me to the datanode. 当我尝试通过namenode访问datanode中的数据时,namenode会将我重定向到datanode。 However, the returned domain name of datanode can only be recognized inside that cluster. 但是,返回的datanode域名只能在该集群内部识别。 Furthermore, I cannot revise /etc/hosts in client side. 此外,我无法在客户端修改/etc/hosts

Can I configure the namenode to redirect me with any IP or domain? 我可以配置namenode以使用任何IP或域重定向我吗? Where is the namenode used to record the domain to return? 名称节点用于记录要返回的域在哪里?

I believe that what you need is a Gateway server (also called EdgeNode ). 我相信您需要的是Gateway服务器(也称为EdgeNode )。 There are several tutorial out there. 有几个教程在那里。

In your particular case your server holding the namenode will also hold the EdgeNode. 在您的特定情况下,拥有namenode的服务器也将拥有EdgeNode。

There are two particular projects to achieve this: 有两个特定的项目可以实现此目的:

  1. Using SOCKS proxy. 使用SOCKS代理。 Using Hadoop through a SOCKS proxy? 通过SOCKS代理使用Hadoop?
  2. Using HTTPFS: https://hadoop.apache.org/docs/r2.4.1/hadoop-hdfs-httpfs/index.html 使用HTTPFS: https ://hadoop.apache.org/docs/r2.4.1/hadoop-hdfs-httpfs/index.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM