[英]How to properly implement kafka consumer as a background service on .NET Core
I have implemented a Kafka consumer as a console app by using BackgroundService on .NET Core 2.2.我通过在 .NET Core 2.2 上使用 BackgroundService 将 Kafka 使用者实现为控制台应用程序。 I am using confluent-kafka-dotnet v1.0.1.1 as a client for Apache Kafka.
我使用 confluent-kafka-dotnet v1.0.1.1 作为 Apache Kafka 的客户端。 I'm not so sure about how to process each message.
我不太确定如何处理每条消息。
Since the processing of each message can take some amount of time (up to 24 hours), I am starting a new Task for each message, so that I don't block the consumer from consuming new messages.由于处理每条消息可能需要一些时间(最多 24 小时),因此我为每条消息启动了一个新任务,这样我就不会阻止消费者使用新消息。 I think that if I have too many messages, creating a new Task each time is not the right way to go.
我认为如果我的消息太多,每次创建一个新任务并不是正确的方法。 What is the proper way to process each message then?
那么处理每条消息的正确方法是什么? Is it possible to create some kind of dynamic background service for each message?
是否可以为每条消息创建某种动态后台服务?
If a message is already being processed but application crashes or a rebalance occurs, I end up consuming and processing the same message more than once.如果一条消息已经在处理中,但应用程序崩溃或发生重新平衡,我最终会多次使用和处理相同的消息。 Should I commit offset automatically (or right after it was consumed) and store the state of the message (or task) somewhere, like in a database?
我应该自动提交偏移量(或在它被消耗后立即提交)并将消息(或任务)的状态存储在某个地方,比如在数据库中?
I know that there is Hangfire, but I am not sure if I need to use it.我知道有 Hangfire,但我不确定是否需要使用它。 If my current approach is totally wrong, please give me some suggestions.
如果我目前的方法完全错误,请给我一些建议。
Here is the implementation of ConsumerService:下面是 ConsumerService 的实现:
public class ConsumerService : BackgroundService
{
private readonly IConfiguration _config;
private readonly IElasticLogger _logger;
private readonly ConsumerConfig _consumerConfig;
private readonly string[] _topics;
private readonly double _maxNumAttempts;
private readonly double _retryIntervalInSec;
public ConsumerService(IConfiguration config, IElasticLogger logger)
{
_config = config;
_logger = logger;
_consumerConfig = new ConsumerConfig
{
BootstrapServers = _config.GetValue<string>("Kafka:BootstrapServers"),
GroupId = _config.GetValue<string>("Kafka:GroupId"),
EnableAutoCommit = _config.GetValue<bool>("Kafka:Consumer:EnableAutoCommit"),
AutoOffsetReset = (AutoOffsetReset)_config.GetValue<int>("Kafka:Consumer:AutoOffsetReset")
};
_topics = _config.GetValue<string>("Kafka:Consumer:Topics").Split(',');
_maxNumAttempts = _config.GetValue<double>("App:MaxNumAttempts");
_retryIntervalInSec = _config.GetValue<double>("App:RetryIntervalInSec");
}
protected override Task ExecuteAsync(CancellationToken stoppingToken)
{
Console.WriteLine("!!! CONSUMER STARTED !!!\n");
// Starting a new Task here because Consume() method is synchronous
var task = Task.Run(() => ProcessQueue(stoppingToken), stoppingToken);
return task;
}
private void ProcessQueue(CancellationToken stoppingToken)
{
using (var consumer = new ConsumerBuilder<Ignore, Request>(_consumerConfig).SetValueDeserializer(new MessageDeserializer()).Build())
{
consumer.Subscribe(_topics);
try
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
var consumeResult = consumer.Consume(stoppingToken);
// Don't want to block consume loop, so starting new Task for each message
Task.Run(async () =>
{
var currentNumAttempts = 0;
var committed = false;
var response = new Response();
while (currentNumAttempts < _maxNumAttempts)
{
currentNumAttempts++;
// SendDataAsync is a method that sends http request to some end-points
response = await Helper.SendDataAsync(consumeResult.Value, _config, _logger);
if (response != null && response.Code >= 0)
{
try
{
consumer.Commit(consumeResult);
committed = true;
break;
}
catch (KafkaException ex)
{
// log
}
}
else
{
// log
}
if (currentNumAttempts < _maxNumAttempts)
{
// Delay between tries
await Task.Delay(TimeSpan.FromSeconds(_retryIntervalInSec));
}
}
if (!committed)
{
try
{
consumer.Commit(consumeResult);
}
catch (KafkaException ex)
{
// log
}
}
}, stoppingToken);
}
catch (ConsumeException ex)
{
// log
}
}
}
catch (OperationCanceledException ex)
{
// log
consumer.Close();
}
}
}
}
Agree with Fabio that you should not Task.Run
in order to process a message since you'll end up with lots of threads wasting resources and switching their execution so that performace will suffer.同意 Fabio 的观点,您不应该使用
Task.Run
来处理消息,因为您最终将有大量线程浪费资源并切换它们的执行,从而影响性能。
Moreover, it's okay to process consumed message in the same thread since Kafka uses pull model and your application can process a message in its own pace.此外,在同一个线程中处理消费的消息是可以的,因为 Kafka 使用拉模型并且您的应用程序可以按照自己的节奏处理消息。
Regarding processing message more than once, I'd suggest to store an offset of the processed message in order to skip already processed messages.关于不止一次处理消息,我建议存储已处理消息的偏移量,以便跳过已处理的消息。 Since offset is a long-based number, you can easily skip messages with offset less than committed earlier.
由于 offset 是一个长基数,因此您可以轻松跳过偏移量小于之前提交的消息。 Of course, this works only if you have one partition because Kafka provides offset counter and order guarantees on partition level
当然,这仅在您有一个分区时才有效,因为 Kafka 在分区级别提供偏移计数器和顺序保证
You can find example of Kafka Consumer in my article .您可以在我的文章中找到 Kafka Consumer 的示例。 If you have questions, feel free to ask, I'm glad to help you
如果您有任何问题,请随时提问,我很乐意为您提供帮助
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.