简体   繁体   English

SSH隧道访问EC2 Hadoop集群

[英]SSH Tunnel to access EC2 Hadoop Cluster

Background : 背景 :

  1. I have installl 3 Node Cloudera Hadoop Cluster on EC2 Instance which is workin as expected. 我在EC2实例上安装了3节点Cloudera Hadoop群集,该群集可以按预期工作。

  2. Client Program on my windows machine to load data from my machine to HDFS. Windows计算机上的客户端程序,用于将计算机中的数据加载到HDFS。

Details : 细节 :

My client program has developed in Java which reads data from the windows local disk and write it to HDFS. 我的客户端程序使用Java开发,可以从Windows本地磁盘读取数据并将其写入HDFS。

For this I am trying to create SSH Tunnel through Putty and than I am trying to login with my windows username to the remote EC2 Instance Which is not working. 为此,我尝试通过Putty创建SSH隧道,然后尝试使用Windows用户名登录到无法正常工作的远程EC2实例。 I am able to login with the unix username. 我可以使用Unix用户名登录。 I wanted to understand is this correct behavior? 我想了解这是正确的行为吗?

I don't know I have created tunnel correctly or not but after that when I try to run my client program it gives me below error : 我不知道我是否正确创建了隧道,但是在此之后,当我尝试运行客户端程序时,出现以下错误:

My client program has developed in Java which reads data from the windows local disk and write it to HDFS. 我的客户端程序使用Java开发,可以从Windows本地磁盘读取数据并将其写入HDFS。 When I am trying to run my programs It is givin me below error. 当我尝试运行我的程序时,这是错误提示。

PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

6:32:45.711 PM     INFO     org.apache.hadoop.ipc.Server     

IPC Server handler 13 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 108.161.91.186:54097: error: java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1331)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)

Any Idea? 任何想法?

您可以使用hdfs fsck / -delete验证hdfs集群的运行状况,然后可以重新平衡您的数据节点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM