简体   繁体   English

提交 jar 文件时,控制台中未打印来自 Kafka 的数据。 (Spark 流 + Kafka 集成 3.1.1)

[英]Data from Kafka is not printed in console when I submmited jar file. (Spark streaming + Kafka integration 3.1.1)

There is no error when I submitted a jar file.我提交 jar 文件时没有错误。

But data isn't printed when I send data using the HTTP protocol.但是当我使用 HTTP 协议发送数据时,没有打印数据。

(Data is printed well when I check using "kafka-console-consumer.sh" ) (当我使用“kafka-console-consumer.sh”检查时,数据打印得很好)

[Picture, submitted a jar file: Data isn't printed] [图片,提交了jar文件:数据未打印]

code and dependencies in jar files are down below. jar 文件中的代码和依赖项如下。 在此处输入图像描述

[Picture, Kafka-console-consumer.sh: Data is printed] 【图,Kafka-console-consumer.sh:数据打印出来】

command:命令:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --group test-consumer --topic test01 --from-beginning

在此处输入图像描述

[JAVA FILE] [Java 文件]

2-1, Dependencies 2-1、依赖

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.12</artifactId>
        <version>3.1.1</version>
        <scope>provided</scope>
    </dependency>
    
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
        <version>3.1.1</version>
    </dependency>
</dependencies>

2-2, Code 2-2、代码

package SparkTest.SparkStreaming;

import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import java.util.*;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.kafka010.*;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;


public final class JavaWordCount {
    public static void main(String[] args) throws Exception {
        // Create a local StreamingContext with two working thread and batch interval of 1 second
        SparkConf conf = new SparkConf().setMaster("yarn").setAppName("JavaWordCount");
        JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));
        
        // load a topic from broker
        Map<String, Object> kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "localhost:9092");
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "test-consumer");
        kafkaParams.put("auto.offset.reset", "latest");
        kafkaParams.put("enable.auto.commit", false);

        Collection<String> topics = Arrays.asList("test01");

        JavaInputDStream<ConsumerRecord<String, String>> stream =
          KafkaUtils.createDirectStream(
            jssc,
            LocationStrategies.PreferBrokers(),
            ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
          );
        
        JavaDStream<String> data = stream.map(v -> {
            return v.value();    // mapping to convert into spark D-Stream 
        });
      
        data.print();
        
        jssc.start();
        jssc.awaitTermination();
    }
}

You're using --from-beginning in the console consumer, but auto.offset.reset=latest in the Spark code.您在控制台使用者中使用--from-beginning ,但在 Spark 代码中使用auto.offset.reset=latest

Therefore, you need to run the producer while Spark runs if you want to see any data因此,如果您想查看任何数据,则需要在 Spark运行时运行生产者

You will also want to consider using spark-sql-kafka-0-10 Structured Streaming dependency instead, as you can find in the KafkaWordCount example您还需要考虑使用spark-sql-kafka-0-10结构化流依赖项, 正如您可以在 KafkaWordCount 示例中找到的那样

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka与Spark Streaming集成错误 - Kafka with spark streaming integration error Apache Spark Streaming Kafka集成错误JAVA - apache spark streaming kafka integration error JAVA apache spark streaming 与 kafka 和 hive 集成 - apache spark streaming with kafka and hive integration Kafka Spark Streaming Consumer是否不会从Kafka Console Producer接收任何消息? - Kafka Spark Streaming Consumer will not receive any messages from Kafka Console Producer? “格式错误的数据长度为负”,当尝试将来自 kafka 的 Spark 结构化流与 Avro 数据源结合使用时 - “Malformed data length is negative”, when trying to use spark structured streaming from kafka with Avro data source 如何使用Java在Spark结构化流中检查从Kafka获取数据? - How can I check I get data from Kafka in Spark-structured-streaming with Java? 在 Spark 流中,是否可以将批处理数据从 kafka 插入到 Hive? - In Spark streaming, Is it possible to upsert batch data from kafka to Hive? 使用火花流从 kafka 读取数据时出现 lz4 异常 - lz4 exception when reading data from kafka using spark streaming Kafka 向 Spark Streaming 发送消息时出错 - Error When Kafka Sending Message to Spark Streaming 获取NotSerializableException - 将Spark Streaming与Kafka一起使用时 - Getting NotSerializableException - When using Spark Streaming with Kafka
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM