简体   繁体   English

如何激发提交Spark Streaming应用程序

[英]How to spark-submit a Spark Streaming application

I am new to Spark and does not have too much idea on it. 我是Spark的新手,并没有太多的想法。 I am working on an application in which data is traversing on different-2 Kafka topic and Spark Streaming reading the data from this topic. 我正在开发一个应用程序,其中数据遍历不同的2 Kafka主题和Spark Streaming从该主题读取数据。 Its a SpringBoot project and i have 3 Spark consumer classes in it. 它是一个SpringBoot项目,我有3个Spark消费者类。 The job of these SparkStreaming classes is to consume the data from a Kafka topic and send it to another topic. 这些SparkStreaming类的工作是使用Kafka主题中的数据并将其发送到另一个主题。 Code of SparkStreaming class is below- SparkStreaming类的代码如下 -

    @Service
public class EnrichEventSparkConsumer {

    Collection<String> topics = Arrays.asList("eventTopic");

    public void startEnrichEventConsumer(JavaStreamingContext javaStreamingContext) {

        Map<String, Object> kafkaParams = new HashedMap();
        kafkaParams.put("bootstrap.servers", "localhost:9092");
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "group1");
        kafkaParams.put("auto.offset.reset", "latest");
        kafkaParams.put("enable.auto.commit", true);


        JavaInputDStream<ConsumerRecord<String, String>> enrichEventRDD = KafkaUtils.createDirectStream(javaStreamingContext,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));

        JavaDStream<String> enrichEventDStream = enrichEventRDD.map((x) -> x.value());
        JavaDStream<EnrichEventDataModel> enrichDataModelDStream = enrichEventDStream.map(convertIntoEnrichModel);

        enrichDataModelDStream.foreachRDD(rdd1 -> {
            saveDataToElasticSearch(rdd1.collect());
        });

        enrichDataModelDStream.foreachRDD(enrichDataModelRdd -> {
            if(enrichDataModelRdd.count() > 0) {
                if(executor != null) {
                    executor.executePolicy(enrichDataModelRdd.collect());       
                }
            }
        }); 

    }

    static Function convertIntoEnrichModel = new Function<String, EnrichEventDataModel>() {

        @Override
        public EnrichEventDataModel call(String record) throws Exception {
            ObjectMapper mapper = new ObjectMapper();
            EnrichEventDataModel csvDataModel = mapper.readValue(record, EnrichEventDataModel.class);
            return csvDataModel;
        }
    };

    private void saveDataToElasticSearch(List<EnrichEventDataModel> baseDataModelList) {
        for (EnrichEventDataModel baseDataModel : baseDataModelList)
            dataModelServiceImpl.save(baseDataModel);
    }
}

I am calling the method startEnrichEventConsumer() using CommandLineRunner. 我使用CommandLineRunner调用方法startEnrichEventConsumer()。

public class EnrichEventSparkConsumerRunner implements CommandLineRunner {

    @Autowired
    JavaStreamingContext javaStreamingContext;

    @Autowired
    EnrichEventSparkConsumer enrichEventSparkConsumer;

    @Override
    public void run(String... args) throws Exception {
        //start Raw Event Spark Cosnumer.
        JobContextImpl jobContext = new JobContextImpl(javaStreamingContext);

        //start Enrich Event Spark Consumer.
        enrichEventSparkConsumer.startEnrichEventConsumer(jobContext.streamingctx());
    }

}

Now i want to submit these three Spark Streaming classes on to the cluster. 现在我想将这三个Spark Streaming类提交到集群。 I read somewhere that i have to create a Jar file first then after it i can use Spark-submit command but i have some questions in my mind - 我在某处读到了我必须首先创建一个Jar文件,之后我可以使用Spark-submit命令,但我脑子里有一些问题 -

  1. Should i create a different project with these 3 Spark Streaming classes? 我应该用这3个Spark Streaming类创建一个不同的项目吗?
  2. As of now i am using CommandLineRunner to initiate SparkStreaming then when to submit cluster , should i create main() method in these class? 截至目前我正在使用CommandLineRunner启动SparkStreaming,然后何时提交集群,我应该在这些类中创建main()方法吗?

Please tell me how to do it. 请告诉我怎么做。 Thanks in advance. 提前致谢。

  • No need for a different project. 不需要一个不同的项目。
  • You should create entry point/ main which is responsible of the JavaStreamingContext creation. 您应该创建入口点/ main,它负责创建JavaStreamingContext。
  • Create your jar with dependencies, the dependencies in one single jar file, don't forget to put provided scope for all your spark dependencies since you will use cluster's libraries. 使用依赖项创建jar,在一个jar文件中创建依赖项,不要忘记为所有spark依赖项提供范围,因为您将使用集群的库。

Executing assembled Spark application is using spark-submit command-line application as follows: 执行组装的Spark应用程序正在使用spark-submit命令行应用程序,如下所示:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

For local submit 对于本地提交

bin/spark-submit \
  --class package.Main \
  --master local[2] \
  path/to/jar argument1 argument2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM