简体   繁体   中英

Can I set task.commit.ms to every 1ms?

I have a project with Apache-Samza and I have a problem with duplicate data.

This is my checkpoint configuration :

task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.checkpoint.replication.factor=2
task.commit.ms=20000

On the documentation We can read this :

If task.checkpoint.factory is configured, this property determines how often a checkpoint is written. The value is the time between checkpoints, in milliseconds. The frequency of checkpointing affects failure recovery: if a container fails unexpectedly (eg due to crash or machine failure) and is restarted, it resumes processing at the last checkpoint. Any messages processed since the last checkpoint on the failed container are processed again. Checkpointing more frequently reduces the number of messages that may be processed twice, but also uses more resources.

So can I change task.commit.ms=20000 to 250ms or 1ms. It's good or very bad ? I have a very good cluster.

Why I need change this, because this Samza(worker) crash 1-3 time each week. And now the temporary solution is commit offset each time.


Documentation ref :

Appache-Samza

Apache-Samza-Configuration

My solution I know it's not the solution for all problem. It's change the task.commit.ms to the same thing of task.shutdown.ms=5000 .

Atlas-Samza-Configuration Shutdown

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM