简体   繁体   English

Kafka不会消耗所有生成的数据

[英]Kafka doesn't consume all produced data

I have a single instance of kafka installed on a VM with 8 cores and 32GB RAM. 我在具有8个内核和32GB RAM的VM上安装了一个kafka实例。

I write to it (produce) from 10 different machines and consume from one machine, all of which are in the same network. 我从10台不同的机器写入(生产)并从一台机器上消费,所有机器都在同一个网络中。

The size of the data that I produce is ~ 35MBit / s. 我生成的数据大小约为35MBit / s。

For some reason, most of the time I can't consume more than ~ 10MBit /s (for limited periods of time I do manage to consume all the produced data), even though the kafka AND the consumer servers are mostly idle (therefore I don't think it's a retention problem). 出于某种原因,大多数时候我不能消耗超过~10MBit / s(在有限的时间内我设法消耗所有产生的数据),即使kafka和消费者服务器大多是空闲的(因此我不要认为这是一个保留问题)。

Could kafka be ignoring some of the produced data? 卡夫卡可以忽略一些产生的数据吗?

Some parameter values that might be useful for analysis: 一些可能对分析有用的参数值:

num.network.threads=32
num.io.threads=16
message.max.bytes=2147483647
num.partitions=10
log.retention.ms=120000 (2 minutes)

Your retention time is way too low. 您的保留时间太短了。 If your consumer ever falls further than 2 minutes behind any one of the 10 producers then the messages will be lost. 如果您的消费者比10个生产者中的任何一个落后2分钟,则消息将丢失。 Try 24 hours or at least as much as you have disk space to fill. 尝试24小时或至少尽可能多的磁盘空间来填充。 The default retention period is 7 days. 默认保留期为7天。 Keeping messages for a longer period will also help you debug if they are all getting into the topic successfully. 如果成功进入主题,那么将消息保留更长时间也可以帮助您进行调试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM