简体繁体 English

作业开始后，我可以动态更改 Google Dataflow 中的日志级别吗？

[英]Can I dynamically alter log levels in Google Dataflow once the job has started?

原文 2022-08-15 18:31:19 5 2 google-cloud-platform/ google-dataflow

From https://cloud.google.com/dataflow/docs/guides/logging I am trying to understand if it is possible to reconfigure the log levels after the job and workers have started.从https://cloud.google.com/dataflow/docs/guides/logging我试图了解是否可以在工作和工作人员开始后重新配置日志级别。

I suspect the answer is "no" but I wanted to see if anyone knows for sure, as the documentation does not specifically address this point.我怀疑答案是否定的，但我想看看是否有人确定，因为文档没有具体说明这一点。

Facilities such as log4j have the ability to dynamically monitor the logging configuration and act on it, but I don't know if Dataflow supports that. log4j 等设施能够动态监控日志记录配置并对其采取行动，但我不知道 Dataflow 是否支持。

2 个解决方案

Good question, I don't think there is such a way to do that, once the worker has started, the only option is to stop, the pipeline if you see something wrong in the logs.好问题，我不认为有这样的方法可以做到这一点，一旦工人开始，唯一的选择就是停止管道，如果你在日志中看到有问题的话。 But don't think that supports what you are looking for但不要认为这支持你正在寻找的东西

AFAIK, by design, it's not possible. AFAIK，按照设计，这是不可能的。 If you imagine the architecture of dataflow, what do you have?如果你想象数据流的架构，你有什么？

A main server that take your code, compile it, package it, and deploy it on workers (that's why, at the beginning, you have only 1 instance, the main server, and then you have an automatic scalability).一个主服务器，它接收你的代码，编译它，package，然后将它部署在工作人员上（这就是为什么一开始你只有一个实例，主服务器，然后你有一个自动可扩展性）。

Then, the data are pulled and transformed on the worker.然后，在工作人员上提取和转换数据。 The code is immutable in the workers.代码在工人中是不可变的。 The main server will never update that code (except if you perform a roll-out in streaming mode).主服务器永远不会更新该代码（除非您在流模式下执行转出）。

Of course, you could imagine that, on a special value read by the worker, the worker update locally its own logs level.当然，您可以想象，根据 worker 读取的特殊值，worker 在本地更新其自己的日志级别。 But you can assume that all the workers will receive the information (because the data are sharded and each worker see only a subset of the data)但是你可以假设所有的工人都会收到信息（因为数据是分片的，每个工人只能看到数据的一个子集）

But, at the end of the day, what's your concern?但是，归根结底，您担心什么？ Do you have too much logs?你有太多的日志吗？ If so, you can use Cloud Logging router to exclude some logs .如果是这样，您可以使用Cloud Logging 路由器排除一些日志。 They won't be ingested and therefore won't be charged它们不会被摄取，因此不会被收费

If the logs slow your workload, this time, you have to rethink/redesign your logging strategy and level, before launching your code如果日志减慢了您的工作量，这一次，您必须重新考虑/重新设计您的日志记录策略和级别，然后再启动您的代码