简体   繁体   English

不消耗Spark流式传输Kafka消息

[英]Spark streaming Kafka messages not consumed

I want to receive messages from a topic in Kafka (broker v 0.10.2.1 ) using Spark (1.6.2) Streaming. 我想使用Spark(1.6.2)Streaming从Kafka(broker v 0.10.2.1 )中的主题接收消息。

I'm using the Receiver approach. 我正在使用Receiver方法。 The code is as following code: 代码如下代码:

public static void main(String[] args) throws Exception
{
    SparkConf sparkConf = new SparkConf().setAppName("SimpleStreamingApp");
    JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, new Duration(5000));
    //
    Map<String, Integer> topicMap = new HashMap<>();
    topicMap.put("myTopic", 1);
    //
    String zkQuorum = "host1:port1,host2:port2,host3:port3";
    //
    Map<String, String> kafkaParamsMap = new HashMap<>();
    kafkaParamsMap.put("bootstraps.server", zkQuorum);
    kafkaParamsMap.put("metadata.broker.list", zkQuorum);
    kafkaParamsMap.put("zookeeper.connect", zkQuorum);
    kafkaParamsMap.put("group.id", "group_name");
    kafkaParamsMap.put("security.protocol", "SASL_PLAINTEXT");
    kafkaParamsMap.put("security.mechanism", "GSSAPI");
    kafkaParamsMap.put("ssl.kerberos.service.name", "kafka");
    kafkaParamsMap.put("key.deserializer", "kafka.serializer.StringDecoder");
    kafkaParamsMap.put("value.deserializer", "kafka.serializer.DefaultDecoder");
    //
    JavaPairReceiverInputDStream<byte[], byte[]> stream = KafkaUtils.createStream(javaStreamingContext,
                            byte[].class, byte[].class,
                            DefaultDecoder.class, DefaultDecoder.class,
                            kafkaParamsMap,
                            topicMap,
                            StorageLevel.MEMORY_ONLY());

    VoidFunction<JavaPairRDD<byte[], byte[]>> voidFunc = new VoidFunction<JavaPairRDD<byte[], byte[]>> ()
    {
       public void call(JavaPairRDD<byte[], byte[]> rdd) throws Exception
       {
          List<Tuple2<byte[], byte[]>> all = rdd.collect();
          System.out.println("size of red: " + all.size());
       }
    }

    stream.forEach(voidFunc);

    javaStreamingContext.start();
    javaStreamingContext.awaitTermination();
}

Access to Kafka is kerberized . 访问Kafka是kerberized When I launch 当我发射

spark-submit --verbose --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" --files jaas.conf,privKey.der --principal <accountName> --keytab <path to keytab file> --master yarn --jars <comma separated path to all jars> --class <fully qualified java main class> <path to jar file containing main class> spark-submit --verbose --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" --files jaas.conf,privKey.der --principal <accountName> --keytab <path to keytab file> --master yarn --jars <comma separated path to all jars> --class <fully qualified java main class> <path to jar file containing main class>

  1. VerifiableProperties class from Kafka logs warning messages for the properties included in the kafkaParams hashmap: VerifiableProperties从卡夫卡类记录列入为性能警告消息kafkaParams的HashMap:
 INFO KafkaReceiver: connecting to zookeeper: <the correct zookeeper quorum provided in kafkaParams map> VerifiableProperties: Property auto.offset.reset is overridden to largest VerifiableProperties: Property enable.auto.commit is not valid. VerifiableProperties: Property sasl.kerberos.service.name is not valid VerifiableProperties: Property key.deserializer is not valid ... VerifiableProperties: Property zookeeper.connect is overridden to .... 

I think because these properties are not accepted, so it might be affecting the stream processing. 我认为因为这些属性不被接受,所以它可能会影响流处理。

** when I launch in the cluster mode --master yarn , then these warning messages don't appear** **当我以群集模式--master yarn启动时,这些警告消息不会出现**

  1. Later, I see following logs repeated every 5 seconds as configured: 之后,我看到配置后每5秒重复一次以下日志:

    INFO BlockRDD: Removing RDD 4 from persistence list

    INFO KafkaInputDStream: Removing blocks of RDD BlockRDD[4] at createStream at ...

    INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()

    INFO ... INFO BlockManager: Removing RDD 4

However, I don't see any actual message getting printed on the console. 但是,我没有看到在控制台上打印任何实际消息

Question: Why is my code not printing any actual messages? 问题:为什么我的代码不打印任何实际消息?

My gradle dependencies are: 我的gradle依赖是:

compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming-kafka_2.10', version: '1.6.2'

stream is an object of JavaPairReceiverInputDStream. stream是JavaPairReceiverInputDStream的一个对象。 Convert it into Dstream and use foreachRDD to print the messages that are consumed from Kafka 将其转换为Dstream并使用foreachRDD打印从Kafka消耗的消息

Spark 1.6.2 not support kafka 0.10 ,just support kafka0.8. Spark 1.6.2不支持kafka 0.10,只支持kafka0.8。 For kafka 0.10 ,you should use spark 2 对于kafka 0.10,你应该使用spark 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM