在Amazon EMR上使用java中的hbase时出现问题

Question

So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. 所以我试图使用自定义jar在Amazon ec2上查询我的hbase集群，我将其作为MapReduce步骤启动。 Im my jar (inside the map function) I call Hbase as so: 我的jar（在map函数内）我将Hbase称为：

public void map( Text key, BytesWritable value, Context contex ) throws IOException, InterruptedException {
    Configuration conf = HBaseConfiguration.create();
    HTable table = new HTable(conf, "tablename");
      ...

the problem is that when it gets to that HTable line and tries to connect to hbase, the step fails and I get the following errors: 问题是，当它到达HTable线并尝试连接到hbase时，步骤失败，我得到以下错误：

2014-02-28 18:00:49,936 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
2014-02-28 18:00:49,974 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 5119@ip-10-0-35-130.ec2.internal
2014-02-28 18:00:49,998 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-02-28 18:00:50,005 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

      ...

2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-02-28 18:01:05,542 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

      ... and on and on

I can use the hbase shell just fine, and can query data and everything from the shell. 我可以很好地使用hbase shell，并且可以从shell查询数据和所有内容。 I have no clue where to start and I've been googling for hours with no luck. 我不知道从哪里开始，我一直在谷歌搜索几个小时没有运气。 Most of the problems like this on the internet dont talk about Amazon specific fixes. 在互联网上这样的大多数问题都没有谈到亚马逊特定的修复。 I thought zookeeper and hbase should automatically be connected properly by the amazon bootstrap. 我认为zookeeper和hbase应该通过亚马逊引导程序自动连接。

Im using the hbase 0.94.17 jar and amazon is running hbase 0.94.7 im pretty sure thats not the problem, Im guessing its more me not setting up the Java code correctly. 我使用hbase 0.94.17 jar和亚马逊运行hbase 0.94.7我很确定这不是问题，我猜它更多我没有正确设置Java代码。 If anyone can help with this itd be greatly appreciated.Thanks 如果有人可以提供帮助，那就非常感谢。谢谢

Answer 1

Well, after almost 30 hours of trying I've found the solution. 好吧，经过近30个小时的尝试，我找到了解决方案。 There are many caveats to this, and versions are important. 有很多警告，版本很重要。

In this case Im using amazon emr hadoop2 (ami 3.0.4) with Hbase 0.94.7 and Im trying to run a custom jar on the same cluster to access hbase locally through java. 在这种情况下，我使用amazon emr hadoop2（ami 3.0.4）和Hbase 0.94.7并且我试图在同一个集群上运行自定义jar以通过java本地访问hbase。

So, the first thing is that the default hbase config will not work because of the external/internal IP idiosynchronicies that EC2 faces. 因此，第一件事是默认的hbase配置不起作用，因为EC2面临的外部/内部IP idiosynchronicies。 So you cant use HConfiguration (because it defaults to a localhost quorum) What you'll have to do is use the configuration that amazon sets up for you (located in /home/hadoop/hbase/conf/hbase-site.xml) and just manually add it to a blank configuration object. 所以你不能使用HConfiguration（因为它默认为localhost仲裁）你要做的就是使用amazon为你设置的配置（位于/home/hadoop/hbase/conf/hbase-site.xml）和只需手动将其添加到空白配置对象。

The connection code looks like this: 连接代码如下所示：

Configuration conf = new Configuration();
conf.addResource("/home/hadoop/hbase/conf/hbase-site.xml");
HBaseAdmin.checkHBaseAvailable(conf);

Secondly, you have to use the correct hbase jar PACKAGED into your custom jar. 其次，你必须在你的自定义jar中使用正确的hbase jar PACKAGED。 The reason is because hbase 94.x is compiled by default for hadoop1, so you have to grab the cloudera hbase jar named hbase-0.94.6-cdh4.3.0.jar (you can find this online) which has been compiled against hadoop2. 原因是因为默认情况下为hadoop1编译了hbase 94.x，所以你必须抓住一个名为hbase-0.94.6-cdh4.3.0.jar的cloudera hbase jar（你可以在网上找到），它已经针对hadoop2进行了编译。 If you don't do this part you will get many nasty, un-googleable errors including the org.apache.hadoop.net.NetUtils exception. 如果你不做这部分，你会得到许多令人讨厌的，不可谷歌的错误，包括org.apache.hadoop.net.NetUtils异常。

在Amazon EMR上使用java中的hbase时出现问题

问题描述

1 个解决方案

解决方案1
9 已采纳 2014-03-03 22:41:14

在Amazon EMR上使用java中的hbase时出现问题

问题描述

1 个解决方案

解决方案1 9 已采纳 2014-03-03 22:41:14

解决方案1
9 已采纳 2014-03-03 22:41:14