简体   繁体   English

在Hortonworks Hadoop(AWS EC2)上访问WebHDFS

[英]Access WebHDFS on Hortonworks Hadoop (AWS EC2)

I'm facing an issue with the WebHDFS access on my Amazon EC2 machine. 我在Amazon EC2机器上遇到WebHDFS访问问题。 I have installed Hortonworks HDP 2.3 btw. 我已经安装了Hortonworks HDP 2.3 btw。

I can retrieve the file status from my local machine in the browser (chrome) with following http request: 我可以通过以下http请求从浏览器(chrome)的本地计算机中检索文件状态:

http://<serverip>:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS

This works fine but if I try to open the file with ?op=OPEN , then it redirects me to the private DNS of the machine, which I cannot access: 这可以正常工作,但是如果我尝试使用?op=OPEN打开文件,则它将我重定向到该机器的私有DNS,我无法访问它:

http://<privatedns>:50075/webhdfs/v1/user/admin/file.csv?op=OPEN&namenoderpcaddress=<privatedns>:8020&offset=0

I also tried to get access to WebHDFS from the AWS machine itself with this command: 我还尝试使用以下命令从AWS机器本身访问WebHDFS:

[ec2-user@<ip> conf]$ curl -i http://localhost:50070/webhdfs/v1/user/admin/file.csv?op=GETFILESTATUS
curl: (7) couldn't connect to host

Does anyone know why I cannot connect to localhost or why the OPEN on my local machine does not work? 有谁知道为什么我无法连接到本地主机或为什么本地计算机上的OPEN无法正常工作? Unfortunately I couldn't find any tutorial to configure the WebHDFS for a Amazon machine. 不幸的是,我找不到任何教程来为Amazon计算机配置WebHDFS。

Thanks in Advance 提前致谢

What happens is that the namenode redirects you to the datanode. 发生的是namenode将您重定向到datanode。 Seems like you installed a single-node cluster, but conceptually the namenode and datanode(s) are distinct, and in your configuration the datanode(s) live/listen on the private side of your EC2 VPC. 似乎您安装了单节点群集,但是从概念上讲,名称节点和数据节点是不同的,并且在您的配置中,数据节点在EC2 VPC的专用端处于活动状态/监听。

You could reconfigure your cluster to host the datanodes on the public IP/DNS (see HDFS Support for Multihomed Networks ), but I would not go that way. 您可以将群集重新配置为在公共IP / DNS上托管数据节点(请参阅HDFS对多宿主网络的支持 ),但我不会那样做。 I think the proper solution is to add a Know gateway , which is a specialized component for accessing a private cluster from a public API. 我认为适当的解决方案是添加一个Know网关 ,这是用于从公共API访问私有集群的专用组件。 Specifically, you will have to configure the datanode URLs, see Chapter 5. Mapping the Internal Nodes to External URLs . 具体来说,您将必须配置datanode URL,请参阅第5章。将内部节点映射到外部URL The example there seems spot on for your case: 在您的情况下似乎可以找到该示例:

For example, when uploading a file with WebHDFS service: 例如,当使用WebHDFS服务上传文件时:

  • The external client sends a request to the gateway WebHDFS service. 外部客户端将请求发送到网关WebHDFS服务。

  • The gateway proxies the request to WebHDFS using the service URL. 网关使用服务URL将请求代理到WebHDFS。

  • WebHDFS determines which DataNodes to create the file on and returns the path for the upload as a Location header in a HTTP redirect, which contains the datanode host information. WebHDFS确定在哪个DataNodes上创建文件,并在HTTP重定向中将上载的路径作为Location标头返回,其中包含datanode主机信息。

  • The gateway augments the routing policy based on the datanode hostname in the redirect by mapping it to the externally resolvable hostname. 网关通过将数据映射到外部可解析的主机名来基于重定向中的数据节点主机名来扩展路由策略。

  • The external client continues to upload the file through the gateway. 外部客户端继续通过网关上载文件。

  • The gateway proxies the request to the datanode by using the augmented routing policy. 网关使用增强的路由策略将请求代理到数据节点。

  • The datanode returns the status of the upload and the gateway again translates the information without exposing any internal cluster details. 数据节点返回上载状态,网关再次转换信息,而不会暴露任何内部集群详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM