简体   繁体   English

将火花流链接到HBase

[英]linking spark-streaming to HBase

I'm new to Spark and HBase but I need to link the two together, I tried the library spark-hbase-connector but with spark-submit it doesn't work even though no error is shown. 我是Spark和HBase的新手,但我需要将两者链接在一起,我尝试了库spark-hbase-connector,但是使用spark-submit,即使没有显示错误,它也无法正常工作。 I searched here and elsewhere for a similar problem or a tutorial but couldn't find one, so could anyone explain how to write to HBase from Spark streaming or recommend a tutorial or a book ? 我在这里和其他地方搜索了类似的问题或教程,但是找不到,所以任何人都可以解释如何从Spark Streaming编写HBase或推荐教程或书吗? Thank you in advance 先感谢您

What finally worked was : 最终起作用的是:

val hconf = HBaseConfiguration.create()
val hTable = new HTable(hconf, "mytab")
val thePut = new Put(Bytes.toBytes(row))
thePut.add(Bytes.toBytes("colfamily"), Bytes.toBytes("c1"), Bytes.toBytes(value)
hTable.put(thePut)

Here is some sample code using Splice Machine (Open Source) to store data into HBase via Spark Streaming and Kafka... 这是一些使用Splice Machine(开源)通过Spark Streaming和Kafka将数据存储到HBase中的示例代码...

https://github.com/splicemachine/splice-community-sample-code/tree/master/tutorial-kafka-spark-streaming https://github.com/splicemachine/splice-community-sample-code/tree/master/tutorial-kafka-spark-streaming

We fought through this as well and know it can be a bit daunting. 我们也为此进行了努力,并且知道这可能会令人生畏。

Here is the relevant code... 这是相关的代码...

        LOG.info("************ SparkStreamingKafka.processKafka start");

   // Create the spark application and set the name to MQTT
    SparkConf sparkConf = new SparkConf().setAppName("KAFKA");

    // Create the spark streaming context with a 'numSeconds' second batch size
    jssc = new JavaStreamingContext(sparkConf, Durations.seconds(numSeconds));
    jssc.checkpoint(checkpointDirectory);

    LOG.info("zookeeper:" + zookeeper);
    LOG.info("group:" + group);
    LOG.info("numThreads:" + numThreads);
    LOG.info("numSeconds:" + numSeconds);


    Map<String, Integer> topicMap = new HashMap<>();
    for (String topic: topics) {
        LOG.info("topic:" + topic);
      topicMap.put(topic, numThreads);
    }

    LOG.info("************ SparkStreamingKafka.processKafka about to read the MQTTUtils.createStream");
    //2. KafkaUtils to collect Kafka messages
    JavaPairDStream<String, String> messages = KafkaUtils.createStream(jssc, zookeeper, group, topicMap);

    //Convert each tuple into a single string.  We want the second tuple
    JavaDStream<String> lines = messages.map(new TupleFunction());

    LOG.info("************ SparkStreamingKafka.processKafka about to do foreachRDD");
    //process the messages on the queue and save them to the database
    lines.foreachRDD(new SaveRDDWithVTI());


    LOG.info("************ SparkStreamingKafka.processKafka prior to context.strt");
    // Start the context
    jssc.start();
    jssc.awaitTermination();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM