简体   繁体   English

作业开始后,我可以动态更改 Google Dataflow 中的日志级别吗?

[英]Can I dynamically alter log levels in Google Dataflow once the job has started?

From https://cloud.google.com/dataflow/docs/guides/logging I am trying to understand if it is possible to reconfigure the log levels after the job and workers have started.https://cloud.google.com/dataflow/docs/guides/logging我试图了解是否可以在工作和工作人员开始后重新配置日志级别。

I suspect the answer is "no" but I wanted to see if anyone knows for sure, as the documentation does not specifically address this point.我怀疑答案是否定的,但我想看看是否有人确定,因为文档没有具体说明这一点。

Facilities such as log4j have the ability to dynamically monitor the logging configuration and act on it, but I don't know if Dataflow supports that. log4j 等设施能够动态监控日志记录配置并对其采取行动,但我不知道 Dataflow 是否支持。

Good question, I don't think there is such a way to do that, once the worker has started, the only option is to stop, the pipeline if you see something wrong in the logs.好问题,我不认为有这样的方法可以做到这一点,一旦工人开始,唯一的选择就是停止管道,如果你在日志中看到有问题的话。 But don't think that supports what you are looking for但不要认为这支持你正在寻找的东西

AFAIK, by design, it's not possible. AFAIK,按照设计,这是不可能的。 If you imagine the architecture of dataflow, what do you have?如果你想象数据流的架构,你有什么?

A main server that take your code, compile it, package it, and deploy it on workers (that's why, at the beginning, you have only 1 instance, the main server, and then you have an automatic scalability).一个主服务器,它接收你的代码,编译它,package,然后将它部署在工作人员上(这就是为什么一开始你只有一个实例,主服务器,然后你有一个自动可扩展性)。

Then, the data are pulled and transformed on the worker.然后,在工作人员上提取和转换数据。 The code is immutable in the workers.代码在工人中是不可变的。 The main server will never update that code (except if you perform a roll-out in streaming mode).主服务器永远不会更新该代码(除非您在流模式下执行转出)。


Of course, you could imagine that, on a special value read by the worker, the worker update locally its own logs level.当然,您可以想象,根据 worker 读取的特殊值,worker 在本地更新其自己的日志级别。 But you can assume that all the workers will receive the information (because the data are sharded and each worker see only a subset of the data)但是你可以假设所有的工人都会收到信息(因为数据是分片的,每个工人只能看到数据的一个子集)


But, at the end of the day, what's your concern?但是,归根结底,您担心什么? Do you have too much logs?你有太多的日志吗? If so, you can use Cloud Logging router to exclude some logs .如果是这样,您可以使用Cloud Logging 路由器排除一些日志 They won't be ingested and therefore won't be charged它们不会被摄取,因此不会被收费

If the logs slow your workload, this time, you have to rethink/redesign your logging strategy and level, before launching your code如果日志减慢了您的工作量,这一次,您必须重新考虑/重新设计您的日志记录策略和级别,然后再启动您的代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何以编程方式取消运行时间过长的 Dataflow 作业? - How can I programmatically cancel a Dataflow job that has run for too long? 一旦使用 apache 光束 sdk 在 Google Cloud 中创建数据流作业,我们可以从云存储桶中删除 tmp 文件吗? - Once dataflow job is created in Google Cloud using apache beam sdk, can we delete the tmp files from cloud storage bucket? Google Dataflow 和 Pubsub - 无法实现 exactly-once 交付 - Google Dataflow and Pubsub - can not achieve exactly-once delivery 我可以将 google DataFlow 与本机 python 一起使用吗? - Can I use google DataFlow with native python? 如何通过运行 Google Compute Engine cron 作业来安排数据流作业 - How to schedule Dataflow Job by running Google Compute Engine cron job Google Dataflow API 按作业名称过滤 - Google Dataflow API Filter by Job Name 作业图太大,无法提交到 Google Cloud Dataflow - Job graph too large to submit to Google Cloud Dataflow Google Dataflow 作业因“上传的数据不足”错误而失败 - Google Dataflow job failed with "insufficient data uploaded" error 如何在谷歌数据流中获取现有作业的“templateLocation”参数值 - How to Get the "templateLocation" parameter value of an existing Job in google dataflow GCP 数据流作业部署 - GCP Dataflow Job Deployment
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM