如何在Flink流作业中读写HBase

Question

如果必须在流应用程序中读写HBASE，我们该怎么做。 我们通过open方法打开一个连接进行写入，如何打开一个连接进行读取。

object test {

    if (args.length != 11) {
      //print args
      System.exit(1)
    }

    val Array() = args
    println("Parameters Passed " + ...);

    val env = StreamExecutionEnvironment.getExecutionEnvironment


    val properties = new Properties()
    properties.setProperty("bootstrap.servers", metadataBrokerList)
    properties.setProperty("zookeeper.connect", zkQuorum)
    properties.setProperty("group.id", group)


    val messageStream = env.addSource(new FlinkKafkaConsumer08[String](topics, new SimpleStringSchema(), properties))

    messageStream.map { x => getheader(x) }





    def getheader(a: String) {

        //Get header and parse and split the headers
                if (metadata not available hit HBASE) { //Device Level send(Just JSON)

            //How to read from HBASE here .

                      } 
                      //If the resultset is not available in Map fetch from phoenix
                      else {
                          //fetch from cache
                      }
     }




    }
   messageStream.writeUsingOutputFormat(new HBaseOutputFormat());
   env.execute()

}

现在在方法getheader内部，如果我想从if(metadata not available hit HBASE)从HBASE读取，我该怎么做。 我不想在这里打开连接，其想法是为线程维护单个连接并重用它，就像flink通过open（）方法对HBASE sink进行flink一样，还是对foreachpartition进行spark处理。 我试过了，但是我无法将StreamExecutionEnvironment传递给方法。 我怎么能做到这一点，有人可以提供摘要吗？

Answer 1

您想从流用户功能读取/写入Apache HBase。 您链接的HBaseReadExample所做的事情有所不同：它将HBase表读入DataSet（Flink的批处理抽象）。 在用户功能中使用此代码将意味着从Flink程序中启动Flink程序。

对于您的用例，您需要在用户功能中直接创建一个HBase客户端并与之交互。 最好的方法是使用RichFlatMapFunction并在open()方法中创建与HBase的连接。

Flink的下一版本（1.2.0）将在用户功能中支持异步I / O操作，这将显着提高应用程序的吞吐量。

如何在Flink流作业中读写HBase

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-10-27 08:08:12

如何在Flink流作业中读写HBase

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-10-27 08:08:12

解决方案1
3 已采纳 2016-10-27 08:08:12