简体繁体 English

谷歌云上的 Python 脚本调度

[英]Python script scheduling on google cloud

原文 2021-10-30 08:37:10 8 2 python/ google-cloud-platform/ google-cloud-run/ google-cloud-scheduler/ python-sched

I'm facing this issue where i have a python script that needs to be run every day at 00:00am on google cloud, possibly using Google Cloud Run, what I'd like to know is something quite specific to which i couldn't find a good answer to, the thing is... Which way is technically best to achieve this?我正面临这个问题，我有一个 python 脚本需要每天早上 00:00 在谷歌云上运行，可能使用谷歌云运行，我想知道的是一些我无法知道的非常具体的东西找到一个很好的答案，事情是......哪种方式在技术上最好地实现这一目标？ is it better to let the cloud trigger a certain script at certain times?让云在特定时间触发特定脚本更好吗？ or is it better to have an always running container which waits (using locks) for a certain time of the day to come, then runs a function as consequence.或者最好让一个始终运行的容器等待（使用锁）一天中的某个时间，然后运行一个函数。 The task the script faces is something quite heavy, it scans for pictures and tries to get plain text out of it (images are downloaded from an instagram page).脚本面临的任务非常繁重，它扫描图片并尝试从中获取纯文本（图片是从 Instagram 页面下载的）。

As i've never implemented such thing in a cloud environment what i need to know boils down to:因为我从未在云环境中实现过这样的事情，所以我需要知道的归结为：
How heavier can be a "lock waiting" script vs a cloud handled scheduler (eg Google Cloud Scheduler), economically speaking does it matter anything when doing such heavy tasks like the ones in my script? “锁定等待”脚本与云处理调度程序（例如谷歌云调度程序）相比有多重，从经济上讲，在执行像我脚本中的任务那样繁重的任务时，这有什么关系吗？

2 个解决方案

I think a Cloud Scheduler may a be a good first solution/approach.我认为Cloud Scheduler可能是一个很好的第一个解决方案/方法。 It can, for example, make some http request, or push a message into a pubsub (which can be used as a pull or push trigger for your script).例如，它可以发出一些 http 请求，或将消息推送到发布订阅（可用作脚本的拉取或推送触发器）。

Under the script I understand any required functionality.在脚本下，我了解任何必需的功能。 It can be implemented in many different ways - Cloud Function (or a group of different Cloud Functions working together to archive one goal), a Cloud Run, or anything else.它可以通过许多不同的方式实现 - Cloud Function（或一组不同的 Cloud Functions 一起工作以归档一个目标）、Cloud Run 或其他任何方式。

My usual personal preference is a pattern Cloud Scheduler => PubSub Topic => push Cloud Function.我通常的个人偏好是模式 Cloud Scheduler => PubSub Topic => push Cloud Function。 Other people may prefer other variations.其他人可能更喜欢其他变体。

The choice of the solution (including the "script" implementation) in your case - I think - depends on functional and non functional requirements, context, scope, skills and knowledge of people who are to develop and maintain the solution, time, CAPEX and OPEX budget, etc.在您的案例中选择解决方案（包括“脚本”实现） - 我认为 - 取决于功能性和非功能性需求、上下文、范围、技能和知识，这些人将开发和维护解决方案、时间、资本支出和OPEX预算等

Don't know if this is the best technically but I would go with a combination of Cloud Run and Cloud Scheduler (we currently have this combination running for one of our projects).不知道这是否是技术上最好的，但我会结合使用 Cloud Run 和 Cloud Scheduler（我们目前正在为我们的一个项目运行这种组合）。

Cloud Run because your script seems to run just once a day and Cloud Run will basically go to sleep when it is not serving a request. Cloud Run，因为您的脚本似乎每天只运行一次，而且 Cloud Run 在不处理请求时基本上会进入休眠状态。 This makes for lower overall cost ie it wakes up when it receives a request, executes the request and goes back to sleep (no charge to you when it is sleeping).这降低了总体成本，即它在收到请求时唤醒，执行请求并返回睡眠状态（睡眠时不向您收费）。

Cloud Scheduler to trigger the url endpoint on Cloud Run at 00:00am. Cloud Scheduler 在上午 00:00 触发 Cloud Run 上的 url 端点。 As the name implies - Scheduler - schedules jobs to run at specific times.顾名思义 - 调度程序 - 安排作业在特定时间运行。

I would also suggest securing your url endpoint (the one that will be deployed on Cloud Run).我还建议保护您的 url 端点（将部署在 Cloud Run 上的端点）。 This ensures only your Cloud Scheduler is the one triggering the url (someone can not 'mistakenly' access the url over the internet and trigger the job unless they have the necessary privilege).这确保只有您的 Cloud Scheduler 是触发 url 的人（除非他们具有必要的权限，否则某人不能“错误地”通过 Internet 访问该 url 并触发作业）。 We have a blog article about how to do this.我们有一篇关于如何做到这一点的博客文章。