简体   繁体   English

Apache Spark Streaming 与 Java & Kafka

[英]Apache Spark Streaming with Java & Kafka

I'm trying to run Spark Streaming example from the official Spark website我正在尝试从Spark 官方网站运行 Spark Streaming 示例

Those are the dependencies I use in my pom file:这些是我在 pom 文件中使用的依赖项:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>2.2.0</version>
</dependency>

This is my Java code:这是我的 Java 代码:

package com.myproject.spark;

import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;

import com.myproject.spark.serialization.JsonDeserializer;

import scala.Tuple2;

public class MainEntryPoint {
  public static void main(String[] args) {
    Map<String, Object> kafkaParams = new HashMap<String, Object>();
    kafkaParams.put("bootstrap.servers", "localhost:9092,localhost:9093,localhost:9094");
    kafkaParams.put("key.deserializer", StringDeserializer.class);
    kafkaParams.put("value.deserializer",JsonDeserializer.class.getName());
    kafkaParams.put("group.id", "ttk-event-listener");
    kafkaParams.put("auto.offset.reset", "latest");
    kafkaParams.put("enable.auto.commit", false);

    Collection<String> topics = Arrays.asList("topic1", "topic2");

    SparkConf conf = new SparkConf()
        .setMaster("local[*]")
        .setAppName("EMSStreamingApp");
    JavaStreamingContext streamingContext =
        new JavaStreamingContext(conf, Durations.seconds(1));

    JavaInputDStream<ConsumerRecord<String, String>> stream =
      KafkaUtils.createDirectStream(
        streamingContext,
        LocationStrategies.PreferConsistent(),
        ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
      );

    stream.mapToPair(record -> new Tuple2<>(record.key(), record.value()));


    streamingContext.start();
    try {
      streamingContext.awaitTermination();
    } catch (InterruptedException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
  }
}

When I try to run it from Eclipse I get following exception:当我尝试从 Eclipse 运行它时,出现以下异常:

18/07/16 13:35:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.106, 51604, None)
18/07/16 13:35:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.106, 51604, None)
Exception in thread "main" java.lang.AbstractMethodError
at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99)
at org.apache.spark.streaming.kafka010.KafkaUtils$.initializeLogIfNecessary(KafkaUtils.scala:39)
at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
at org.apache.spark.streaming.kafka010.KafkaUtils$.log(KafkaUtils.scala:39)
at org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66)
at org.apache.spark.streaming.kafka010.KafkaUtils$.logWarning(KafkaUtils.scala:39)
at org.apache.spark.streaming.kafka010.KafkaUtils$.fixKafkaParams(KafkaUtils.scala:201)
at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.<init>(DirectKafkaInputDStream.scala:63)
at org.apache.spark.streaming.kafka010.KafkaUtils$.createDirectStream(KafkaUtils.scala:147)
at org.apache.spark.streaming.kafka010.KafkaUtils$.createDirectStream(KafkaUtils.scala:124)
at org.apache.spark.streaming.kafka010.KafkaUtils$.createDirectStream(KafkaUtils.scala:168)
at org.apache.spark.streaming.kafka010.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.myproject.spark.MainEntryPoint.main(MainEntryPoint.java:47)
18/07/16 13:35:28 INFO SparkContext: Invoking stop() from shutdown hook

I run this from my IDE (eclipse).我从我的 IDE (eclipse) 运行它。 Do I have to create and deploy the JAR into spark to make it run.我是否必须创建 JAR 并将其部署到 spark 中才能运行。 If anyone knows about the exception, please share your experience.如果有人知道异常,请分享您的经验。 Thanks in advance提前致谢

Try using 2.3.1 also for the spark-streaming-kafka dependency.尝试将 2.3.1 也用于 spark-streaming-kafka 依赖项。

Check also other related questions and their answers about java.lang.AbstractMethodError .还检查有关java.lang.AbstractMethodError的其他相关问题及其答案。

It usually means a mismatch between used libraries and their interfaces/implementations.这通常意味着使用的库与其接口/实现之间的不匹配。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Spark Streaming Kafka集成错误JAVA - apache spark streaming kafka integration error JAVA Java中的对象不可序列化(org.apache.kafka.clients.consumer.ConsumerRecord)spark kafka流 - Object not serializable (org.apache.kafka.clients.consumer.ConsumerRecord) in Java spark kafka streaming 结构化流卡夫卡火花 java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging - Structured Streaming kafka spark java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging apache spark streaming 与 kafka 和 hive 集成 - apache spark streaming with kafka and hive integration Spark和Kafka问题-线程“ main”中的异常java.lang.NoClassDefFoundError:org.apache.spark.streaming.kafka010.LocationStrategies - Spark and Kafka issue - Exception in thread “main” java.lang.NoClassDefFoundError: org.apache.spark.streaming.kafka010.LocationStrategies java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer 用于火花流 - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer for spark streaming kafka引发流式传输java api问题 - kafka to spark streaming java api issues 使用Java Kafka进行Spark结构化流式编程 - Spark Structured Streaming Programming with Kafka in Java Apache Spark-在流事件中捕获Kafka数据以触发工作流 - Apache Spark - capturing Kafka data on streaming event to trigger workflow 使用Apache Kafka生成数据并使用Spark Streaming接收数据 - Generate data with apache kafka and receive it using spark streaming
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM