SSH隧道访问EC2 Hadoop集群

Question

Background : 背景：

I have installl 3 Node Cloudera Hadoop Cluster on EC2 Instance which is workin as expected. 我在EC2实例上安装了3节点Cloudera Hadoop群集，该群集可以按预期工作。
Client Program on my windows machine to load data from my machine to HDFS. Windows计算机上的客户端程序，用于将计算机中的数据加载到HDFS。

Details : 细节：

My client program has developed in Java which reads data from the windows local disk and write it to HDFS. 我的客户端程序使用Java开发，可以从Windows本地磁盘读取数据并将其写入HDFS。

For this I am trying to create SSH Tunnel through Putty and than I am trying to login with my windows username to the remote EC2 Instance Which is not working. 为此，我尝试通过Putty创建SSH隧道，然后尝试使用Windows用户名登录到无法正常工作的远程EC2实例。 I am able to login with the unix username. 我可以使用Unix用户名登录。 I wanted to understand is this correct behavior? 我想了解这是正确的行为吗？

I don't know I have created tunnel correctly or not but after that when I try to run my client program it gives me below error : 我不知道我是否正确创建了隧道，但是在此之后，当我尝试运行客户端程序时，出现以下错误：

My client program has developed in Java which reads data from the windows local disk and write it to HDFS. 我的客户端程序使用Java开发，可以从Windows本地磁盘读取数据并将其写入HDFS。 When I am trying to run my programs It is givin me below error. 当我尝试运行我的程序时，这是错误提示。

PriviledgedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.

6:32:45.711 PM     INFO     org.apache.hadoop.ipc.Server     

IPC Server handler 13 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 108.161.91.186:54097: error: java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java.io.IOException: File /user/ubuntu/features.json could only be replicated to 0 nodes instead of minReplication (=1).  There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1331)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)

Any Idea? 任何想法？

Answer 1

您可以使用hdfs fsck / -delete验证hdfs集群的运行状况，然后可以重新平衡您的数据节点。

SSH隧道访问EC2 Hadoop集群

问题描述

1 个解决方案

解决方案1
0 2016-12-29 08:18:45

SSH隧道访问EC2 Hadoop集群

问题描述

1 个解决方案

解决方案1 0 2016-12-29 08:18:45

解决方案1
0 2016-12-29 08:18:45