简体   繁体   English

从kafka消息中获取主题

[英]Get topic from kafka message

How can I identify the topic name from a message in kafka . 如何从kafka中的消息中识别主题名称。

String[] topics = { "test", "test1", "test2" };
    for (String t : topics) {
        topicMap.put(t, new Integer(3));
    }

SparkConf conf = new SparkConf().setAppName("KafkaReceiver")
            .set("spark.streaming.receiver.writeAheadLog.enable", "false")
            .setMaster("local[4]")
            .set("spark.cassandra.connection.host", "localhost");
    ;
    final JavaSparkContext sc = new JavaSparkContext(conf);
    JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(
            1000));

    /* Receive Kafka streaming inputs */
    JavaPairReceiverInputDStream<String, String> messages = KafkaUtils
            .createStream(jssc, "localhost:2181", "test-group",
                    topicMap);

   JavaDStream<MessageAndMetadata> data = 
          messages.map(new Function<Tuple2<String, String>, MessageAndMetadata>() 
          {

              public MessageAndMetadata call(Tuple2<String, String> message)
              {
                  System.out.println("message ="+message._2);
                  return null;
              }
          }

          );

I can fetch message from kafka producer. 我可以从kafka制作人那里获取消息。 But since the consumer now consuming from three topic, it is needed to identify topic name. 但由于消费者现在正在消费三个主题,因此需要确定主题名称。

As of Spark 1.5.0, official documentation encourages using no-receiver/direct approach starting from recent releases, which has graduated from experimental in recent 1.5.0. 从Spark 1.5.0开始, 官方文档鼓励从最近的版本开始使用无接收器/直接方法,最近的版本已从最近的1.5.0开始实验。 This new Direct API allows you to easily obtain message and its metadata apart from other good things. 这个新的Direct API允许您轻松获取消息及其元数据,而不是其他好东西。

Unfortunately, this is not straightforward as KafkaReceiver and ReliableKafkaReceiver in Spark's source code only store MessageAndMetadata.key and message. 不幸的是,这并不简单,因为Spark的源代码中的KafkaReceiver和ReliableKafkaReceiver仅存储MessageAndMetadata.key和消息。

There are two open tickets related to this issue in Spark's JIRA: Spark的JIRA中有两个与此问题相关的开放票:

which have been opened for a while. 已经开了一段时间了。

A dirty copy/paste/modify of Spark's source code to solve your issue: 复制/粘贴/修改Spark的源代码以解决您的问题:

package org.apache.spark.streaming.kafka

import java.lang.{Integer => JInt}
import java.util.{Map => JMap, Properties}

import kafka.consumer.{KafkaStream, Consumer, ConsumerConfig, ConsumerConnector}
import kafka.serializer.{Decoder, StringDecoder}
import kafka.utils.VerifiableProperties
import org.apache.spark.Logging
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.api.java.{JavaReceiverInputDStream, JavaStreamingContext}
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.receiver.Receiver
import org.apache.spark.streaming.util.WriteAheadLogUtils
import org.apache.spark.util.ThreadUtils
import scala.collection.JavaConverters._
import scala.collection.Map
import scala.reflect._

object MoreKafkaUtils {

  def createStream(
    jssc: JavaStreamingContext,
    zkQuorum: String,
    groupId: String,
    topics: JMap[String, JInt],
    storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
  ): JavaReceiverInputDStream[(String, String, String)] = {
    val kafkaParams = Map[String, String](
      "zookeeper.connect" -> zkQuorum, "group.id" -> groupId,
      "zookeeper.connection.timeout.ms" -> "10000")
    val walEnabled = WriteAheadLogUtils.enableReceiverLog(jssc.ssc.conf)
    new KafkaInputDStreamWithTopic[String, String, StringDecoder, StringDecoder](jssc.ssc, kafkaParams, topics.asScala.mapValues(_.intValue()), walEnabled, storageLevel)
  }

}

private[streaming]
class KafkaInputDStreamWithTopic[
  K: ClassTag,
  V: ClassTag,
  U <: Decoder[_] : ClassTag,
  T <: Decoder[_] : ClassTag](
    @transient ssc_ : StreamingContext,
    kafkaParams: Map[String, String],
    topics: Map[String, Int],
    useReliableReceiver: Boolean,
    storageLevel: StorageLevel
  ) extends ReceiverInputDStream[(K, V, String)](ssc_) with Logging {

  def getReceiver(): Receiver[(K, V, String)] = {
    if (!useReliableReceiver) {
      new KafkaReceiverWithTopic[K, V, U, T](kafkaParams, topics, storageLevel)
    } else {
      new ReliableKafkaReceiverWithTopic[K, V, U, T](kafkaParams, topics, storageLevel)
    }
  }
}

private[streaming]
class KafkaReceiverWithTopic[
  K: ClassTag,
  V: ClassTag,
  U <: Decoder[_] : ClassTag,
  T <: Decoder[_] : ClassTag](
    kafkaParams: Map[String, String],
    topics: Map[String, Int],
    storageLevel: StorageLevel
  ) extends Receiver[(K, V, String)](storageLevel) with Logging {

  // Connection to Kafka
  var consumerConnector: ConsumerConnector = null

  def onStop() {
    if (consumerConnector != null) {
      consumerConnector.shutdown()
      consumerConnector = null
    }
  }

  def onStart() {

    logInfo("Starting Kafka Consumer Stream with group: " + kafkaParams("group.id"))

    // Kafka connection properties
    val props = new Properties()
    kafkaParams.foreach(param => props.put(param._1, param._2))

    val zkConnect = kafkaParams("zookeeper.connect")
    // Create the connection to the cluster
    logInfo("Connecting to Zookeeper: " + zkConnect)
    val consumerConfig = new ConsumerConfig(props)
    consumerConnector = Consumer.create(consumerConfig)
    logInfo("Connected to " + zkConnect)

    val keyDecoder = classTag[U].runtimeClass.getConstructor(classOf[VerifiableProperties])
      .newInstance(consumerConfig.props)
      .asInstanceOf[Decoder[K]]
    val valueDecoder = classTag[T].runtimeClass.getConstructor(classOf[VerifiableProperties])
      .newInstance(consumerConfig.props)
      .asInstanceOf[Decoder[V]]

    // Create threads for each topic/message Stream we are listening
    val topicMessageStreams = consumerConnector.createMessageStreams(
      topics, keyDecoder, valueDecoder)

    val executorPool =
      ThreadUtils.newDaemonFixedThreadPool(topics.values.sum, "KafkaMessageHandler")
    try {
      // Start the messages handler for each partition
      topicMessageStreams.values.foreach { streams =>
        streams.foreach { stream => executorPool.submit(new MessageHandler(stream)) }
      }
    } finally {
      executorPool.shutdown() // Just causes threads to terminate after work is done
    }
  }

  // Handles Kafka messages
  private class MessageHandler(stream: KafkaStream[K, V])
    extends Runnable {
    def run() {
      logInfo("Starting MessageHandler.")
      try {
        val streamIterator = stream.iterator()
        while (streamIterator.hasNext()) {
          val msgAndMetadata = streamIterator.next()
          store((msgAndMetadata.key, msgAndMetadata.message, msgAndMetadata.topic))
        }
      } catch {
        case e: Throwable => reportError("Error handling message; exiting", e)
      }
    }
  }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM