简体   繁体   English

如何为Apache Kafka代理启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用

[英]How to enable GC logging for Apache Kafka brokers, while preventing log file overwrites and capping disk space usage

We recently decided to enable GC logging for Apache Kafka brokers on a number of clusters (exact version varies) as a aid to looking into Kafka-related memory and garbage collection problems. 我们最近决定在许多集群上启用针对Apache Kafka代理的GC日志记录(确切版本不同),以帮助查看与Kafka相关的内存和垃圾收集问题。 We want to do that for running brokers (not for Kafka operations such as "kafka-topics.sh"). 我们想要运行经纪人(不是像“kafka-topics.sh”这样的Kafka操作)。 We also want to avoid two problems we know might happen: 我们还想避免我们知道可能发生的两个问题:

  • overwriting of the log file when a broker restarts for any reason 当代理因任何原因重新启动时覆盖日志文件
  • the logs using too much disk space, leading to disks getting filled (if you keep the cluster running long enough, log files will fill up disk unless managed) 日志使用太多磁盘空间,导致磁盘被填满(如果你保持集群运行足够长的时间,日志文件将填满磁盘,除非被管理)

When Java GC logging starts for a process it seems to replace the content of any file that has the same name. 当Java GC日志记录开始进程时,它似乎替换了具有相同名称的任何文件的内容。 This means that unless you are careful, you will lose the GC logging, perhaps when you are more likely to need it. 这意味着除非您小心,否则您将失去GC记录,可能在您更可能需要它时。

Setting the environment variable GC_LOG_ENABLED to be "true" before running kafka-server-start.sh enables GC logging, but doesn't address the two problems above. 在运行kafka-server-start.sh之前将环境变量GC_LOG_ENABLED设置为“true”启用GC日志记录,但不解决上述两个问题。 That adds these fixed set of parameters: -Xloggc:<gc-log-file-loc> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps where gc-log-file-loc is in the same directory and name as the .out file put with "-gc.log" instead of ".out". 这会添加以下固定参数集: -Xloggc:<gc-log-file-loc> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps其中gc-log-file-loc位于与.out文件相同的目录和名称,其中包含“-gc.log”而不是“.out”。

You can set KAFKA_GC_LOG_OPTS with the specific JVM parameters below before running kafka-server-start.sh. 您可以设置KAFKA_GC_LOG_OPTS与下面的具体参数JVM运行kafka-server-start.sh之前。 This works because kafka-run-class.sh specifically includes the contents of that environment variable in JVM options, but only if it is passed -loggc on its command line; 这是有效的,因为kafka-run-class.sh在JVM选项中专门包含该环境变量的内容,但-loggc是它在命令行上传递-loggc ; kafka-server-start.sh does pass this. kafka-server-start.sh确实通过了这个。

If you are starting Kafka through Apache Ambari, you would set KAFKA_GC_LOG_OPTS in Kafka service > Configs > Advanced kafka-env > kafka-env template. 如果您通过Apache KAFKA_GC_LOG_OPTS启动Kafka,则可以在Kafka服务>配置>高级kafka-env> kafka-env模板中设置KAFKA_GC_LOG_OPTS If you set it here, it will only be used for kafka-server-start.sh it seems. 如果你在这里设置它,它似乎只用于kafka-server-start.sh。 The other scripts do not currently pass -loggc to kafka-run-class.sh. 其他脚本当前没有将-loggc传递给kafka-run-class.sh。

Now lets discuss the JVM parameters to include in KAFKA_GC_LOG_OPTS . 现在让我们讨论要包含在KAFKA_GC_LOG_OPTS的JVM参数。

To enable GC logging to a file, you will need to add -verbose:gc -Xloggc:<log-file-location> . 要启用GC日志记录到文件,您需要添加-verbose:gc -Xloggc:<log-file-location>

You need to give the log file name special consideration to prevent overwrites whenever the broker is restarted. 您需要特别考虑日志文件名,以防止在重新启动代理时覆盖。 It seems like you need to have a unique name for every invocation so appending a timestamp seems like the best option. 看起来你需要为每次调用都有一个唯一的名称,所以附加一个时间戳似乎是最好的选择。 You can add something like `date +'%Y%m%d%H%M'` to add a timestamp. 您可以添加“date +'%Y%m%d%H%M'`之类的内容来添加时间戳。 In this example, it is in the form of YYYYMMDDHHMM. 在此示例中,它采用YYYYMMDDHHMM的形式。 In some versions of Java you can put "%t" in your log file location and it will be replaced by the broker start up timestamp formatted as YYYY-MM-DD_HH-MM-SS. 在某些版本的Java中,您可以在日志文件位置放置“%t”,它将被格式化为YYYY-MM-DD_HH-MM-SS的代理启动时间戳替换。

Now onto managing use of disk space. 现在来管理磁盘空间的使用。 I'll be happy if there is a simpler way than what I have. 如果有比我更简单的方式,我会很高兴。

First, take advantage of Java's built-in GC log file rotation. 首先,利用Java的内置GC日志文件轮换。 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M is an example of enabling this rotation, having up to 10 GC log files from the JVM, each of which is no more than 100MB in size. -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M是启用此轮换的示例,具有来自JVM的最多10个GC日志文件,每个文件的大小不超过100MB。 10 x 100MB is 1000MB max usage. 10 x 100MB最大使用量为1000MB。

With the GC log file rotation in place with up to 10 files, '.0', '.1', ... '.9' will be added to the file name you gave in Xloggc. 将GC日志文件旋转到最多10个文件时,“。0”,“。1”,...“。9”将添加到您在Xloggc中提供的文件名中。 .0 will be first and after it reaches .9 it will replace .0 and continue on in a round robin manner. .0将是第一个,在它到达.9之后它将取代.0并以循环方式继续。 In some versions of Java '.current' will be additionally put on the end of the name of the log file currently being written to. 在某些版本的Java中,“。current”将另外放在当前正在写入的日志文件名称的末尾。

Due to the unique file naming we apparently have to have to avoid overwrites, you can have 1000MB per broker process invocation , so this is not a total solution to managing disk space used by Kafka broker GC logs. 由于唯一的文件命名,我们显然必须避免覆盖, 每个代理进程调用可以有1000MB,因此这不是管理Kafka代理GC日志使用的磁盘空间的完整解决方案。 You will end up with a set of up to 10 GC log files for each broker -- this can add up over time. 每个代理最终会有一组最多10个GC日志文件 - 这可能会随着时间的推移而增加。 The best solution (under *nix) to that would seem to be to use the logrotate utility (or some other utility) to periodically clean up broker GC logs that have not been modified in the last N days. 最好的解决方案(在* nix下)似乎是使用logrotate实用程序(或其他一些实用程序)定期清理在过去N天内未修改的代理GC日志。

Be sure to do the math and make sure you will have enough disk space. 请务必进行数学运算并确保您有足够的磁盘空间。

People frequently want more details and context in their GC logs than the default, so consider adding in -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps like it done with GC_LOG_ENABLED=true . 人们经常希望GC日志中的详细信息和上下文多于默认值,因此请考虑添加-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps就像使用GC_LOG_ENABLED=true

Putting all end parameters together into KAFKA_GC_LOG_OPTS and starting a broker you might have:
TIMESTAMP=`date +'%Y%m%d%H%M'`
# GC log location/name prior to .n addition by log rotation
GC_LOG_NAME="{{kafka_log_dir}}/kafka-broker-gc.log-$TIMESTAMP"

GC_LOG_ENABLE_OPTS="-verbose:gc -Xloggc:$GC_LOG_NAME"
GC_LOG_ROTATION_OPTS="-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M"
GC_LOG_FORMAT_OPTS="-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps"

export KAFKA_GC_LOG_OPTS="$GC_LOG_ENABLE_OPTS $GC_LOG_ROTATION_OPTS $GC_LOG_FORMAT_OPTS"
./kafka-server-start.sh server.properties

From the command line, replace {{kafka_log_dir}} with the location of your Kafka log directory or wherever want the GC logs to go. 在命令行中,将{{kafka_log_dir}}替换为Kafka日志目录的位置或GC日志的位置。 You can change the log file naming too. 您也可以更改日志文件命名。

Under Ambari, you would add those lines (but not running kafka-server-start.sh) to "kafka-env template" field in the Ambari UI. 在Ambari下,您可以将这些行(但不运行kafka-server-start.sh)添加到Ambari UI中的“kafka-env模板”字段。 {{kafka_log_dir}} will be automatically replaced with the Kafka log directory, defined shortly above the field. {{kafka_log_dir}}将自动替换为在该字段上{{kafka_log_dir}}的Kafka日志目录。 You'll need to restart the brokers to start the brokers logging (consider doing a rolling upgrade). 您需要重新启动代理以启动代理记录(考虑进行滚动升级)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为Apache Storm Worker启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Apache Storm workers, while preventing log file overwrites and capping disk space usage 如何为Apache Storm进程启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Apache Storm processes, while preventing log file overwrites and capping disk space usage 如何为Apache Hadoop NameNode启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Apache Hadoop NameNodes, while preventing log file overwrites and capping disk space usage 如何为Apache HiveServer2 / MetaStore服务器/ WebHCat服务器启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Apache HiveServer2/MetaStore server/WebHCat server, while preventing log file overwrites and capping disk space usage 如何为Hadoop MapReduce2 History Server启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Hadoop MapReduce2 History Server, while preventing log file overwrites and capping disk space usage 如何为Hadoop YARN ResourceManager和ApplicationTimeline启用GC日志记录,同时防止日志文件覆盖和限制磁盘空间使用 - How to enable GC logging for Hadoop YARN ResourceManager and ApplicationTimeline, while preventing log file overwrites and capping disk space usage django+uwsgi 使用 TimedRotatingFileHandler 进行日志记录“覆盖旋转的日志文件” - django+uwsgi logging with TimedRotatingFileHandler "overwrites rotated log file" 如何在Apache Ace中启用日志记录? - How to enable logging in Apache Ace? 如何启用Apache FtpServer的日志记录 - How to enable logging for Apache FtpServer Kafka:如何启用客户端日志记录? - Kafka: How do I enable client logging?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM