简体   繁体   English

Kafka生产者故障转移机制和将数据推送到主题的验证

[英]Kafka producer failover mechanism and validation of data being pushed to topic

I have written a code to push data to kafka topic on daily basis, but there are few issue which i am not sure this code will be able to handle. 我已经写了一个代码来每天将数据推送到kafka主题,但是我不确定该代码是否能够处理的问题很少。 my responsibility is to push complete data from a live table which holds 1 day data(refreshed every day morning) 我的职责是从包含1天数据的实时表中推送完整数据(每天早上刷新)

my code will query "select * from mytable" and push it one by one to kafka topic as before pushing i need to validate/alter each row and push to topic. 我的代码将查询“从mytable中选择*”,并将其一一推送到kafka主题,然后再推送,我需要验证/更改每一行并推送到主题。

below is my producer send code. 下面是我的生产者发送代码。

    Properties configProperties = new Properties();
        configProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, sBOOTSTRAP_SERVERS_CONFIG);
        configProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                "org.apache.kafka.common.serialization.StringSerializer");
        configProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                "org.apache.kafka.common.serialization.StringSerializer");
        configProperties.put("acks", "all");
        configProperties.put("retries", 0);
        configProperties.put("batch.size", 15000);
        configProperties.put("linger.ms", 1);
        configProperties.put("buffer.memory", 30000000);
        @SuppressWarnings("resource")
        KafkaProducer<String, String> producer = new KafkaProducer<String, String>(configProperties);
        System.out.println("Starting Kafka producer job  " + new Date());
        producer.send(new ProducerRecord<String, String>(eventName, jsonRec.toString()), new Callback() {
            public void onCompletion(RecordMetadata metadata, Exception e) {
                if (e != null) {
                    e.printStackTrace();
                }
            }
        });

Now, i am not sure how to push data back again into topic in case of failure. 现在,我不确定在失败的情况下如何将数据再次推回主题。 Since i have selected all the records from table and few of it got failed and i do not know which all. 由于我已经从表中选择了所有记录,因此很少有失败,而且我不知道全部。

Below is what i want to address 以下是我要解决的问题

  1. how can process only those records which are not pushed to avoid duplicate record being push(avoid redundancy). 如何只处理那些没有推送的记录,以避免重复的记录被推送(避免冗余)。

  2. how to validate the records pushed are exactly same as in table. 如何验证推送的记录与表中的完全相同。 i mean the data integrity. 我的意思是数据完整性。 like size of data and count of records been pushed. 例如数据大小和记录数已被推送。

You can use configProperties.put("enable.idempotence", true); 您可以使用configProperties.put("enable.idempotence", true); - it will try to retry failed messages but make sure there will be just one of each record saved in kafka. -它会尝试重试失败的消息,但要确保在kafka中保存的每个记录中只有一个。 Note that it implies that retries>0 acks=all and max.in.flight.requests.per.connection >=0. 请注意,这意味着retries>0 acks=all并且max.in.flight.requests.per.connection > = 0。 For details check https://kafka.apache.org/documentation/ . 有关详细信息,请访问https://kafka.apache.org/documentation/

For 2nd question - if you mean that you need to save all records or none then you have to use kafka transactions, which brings a lot more questions, I would recommend reading https://www.confluent.io/blog/transactions-apache-kafka/ 对于第二个问题-如果您要保存所有记录或不保存所有记录,则必须使用kafka事务处理,这会带来更多问题,我建议阅读https://www.confluent.io/blog/transactions-apache -kafka /

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM