简体   繁体   English

如何通过运行 Google Compute Engine cron 作业来安排数据流作业

[英]How to schedule Dataflow Job by running Google Compute Engine cron job

In the Dataflow FAQ , it is listed that running custom (cron) job processes on Compute Engine is a way to schedule dataflow pipelines.数据流常见问题解答中,列出了在 Compute Engine 上运行自定义 (cron) 作业进程是一种安排数据流管道的方法。 I am confused about how exactly that should be done: how to start the dataflow job on compute engine and start a cron job.我对具体应该如何完成感到困惑:如何在计算引擎上启动数据流作业并启动 cron 作业。

Thank you!谢谢!

I have this working on App Engine, but I imagine this is similar for Compute Engine我在 App Engine 上工作,但我想这与 Compute Engine 类似

Cron will hit an endpoint on your service at the frequency you specify. Cron 将以您指定的频率命中您服务上的一个端点。 So you need to setup a request handler for that endpoint that will launch the dataflow job when hit (essentially in your request handler you need to define your pipeline and then call 'run' on it).因此,您需要为该端点设置一个请求处理程序,它将在命中时启动数据流作业(基本上在您的请求处理程序中,您需要定义您的管道,然后在其上调用“运行”)。

That should be the basics of it.这应该是它的基础。 An extra step I do is I have the request handler for my cron job launch a cloud task and then I have the request handler for my cloud task launch the dataflow job.我做的一个额外步骤是让我的 cron 作业的请求处理程序启动云任务,然后让我的云任务的请求处理程序启动数据流作业。 I do this because I've noticed the 'run' command for pipelines sometimes taking a while and cloud tasks have a 10 minute timeout, compared to the 30s timeout for cron jobs (or was it 60s).我这样做是因为我注意到管道的“运行”命令有时需要一段时间,而云任务有 10 分钟的超时时间,而 cron 作业的超时时间为 30 秒(或者是 60 秒)。

You can use the Google Cloud Scheduler to execute your Dataflow Job.您可以使用 Google Cloud Scheduler来执行数据流作业。 On Cloud Scheduler you have targets, these could be HTTP/S endpoints, Pub/Sub topics, App Engine applications, you can use your Dataflow template as target.在 Cloud Scheduler 上,您有目标,这些可能是 HTTP/S 端点、Pub/Sub 主题、App Engine 应用程序,您可以使用数据流模板作为目标。 Review this external article to see an example: Schedule Your Dataflow Batch Jobs With Cloud Scheduler or if you want to add more services to the interacion: Scheduling Dataflow Pipeline using Cloud Run, PubSub and Cloud Scheduler .查看此外部文章以查看示例: 使用 Cloud Scheduler 安排您的数据流批处理作业,或者如果您想向交互添加更多服务: 使用 Cloud Run、PubSub 和 Cloud Scheduler 安排数据流管道

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM