简体   繁体   English

是否可以在 Apache 光束或谷歌云数据流中运行自定义 python 脚本

[英]Is it possible to run a custom python script in Apache beam or google cloud dataflow

I want to run one of my python scripts using GCP.我想使用 GCP 运行我的 python 脚本之一。 I am fairly new to GCP so I don't have a lot of idea.我对 GCP 还很陌生,所以我不太了解。

My python script grabs data from BigQuery and perform these tasks我的 python 脚本从 BigQuery 抓取数据并执行这些任务

Several data processing operations几种数据处理操作

Build a ML model using KDTree and few clustering algorithms使用 KDTree 和少量聚类算法构建 ML model

Dumping the final result to a Big Query table.将最终结果转储到 Big Query 表。

This script needs to run every night.该脚本需要每晚运行。

So far I know I can use VMs, Cloud Run, Cloud function ( not a good option for me as it will take about an hour to finish everything).到目前为止,我知道我可以使用虚拟机、Cloud Run、Cloud function(对我来说不是一个好选择,因为完成所有事情大约需要一个小时)。 What should be the best choice for me to run this?什么应该是我运行这个的最佳选择?

I came across Dataflow, but I am curious to know if it's possible to run a custom python script that can do all these things in google cloud dataflow (assuming I will have to convert everything into map-reduce format that doesn't seem easy with my code especially the ML part)?我遇到了 Dataflow,但我很想知道是否可以运行自定义 python 脚本,该脚本可以在谷歌云数据流中执行所有这些操作(假设我必须将所有内容转换为 map-reduce 格式,这似乎并不容易我的代码,尤其是 ML 部分)?

Do you just need a python script to run on a single instance for a couple hours and then terminate?您是否只需要一个 python 脚本在单个实例上运行几个小时然后终止?

You could setup a 'basic scaling' app-engine micro-service within your GCP project.您可以在 GCP 项目中设置“基本扩展”应用引擎微服务。 The max run-time for taskqueue tasks is 24 hours when using 'basic scaling'.使用“基本缩放”时,任务队列任务的最长运行时间为 24 小时。

Requests can run for up to 24 hours.请求最多可以运行 24 小时。 A basic-scaled instance can choose to handle /_ah/start and execute a program or script for many hours without returning an HTTP response code.基本扩展的实例可以选择处理 /_ah/start 并执行程序或脚本数小时而不返回 HTTP 响应代码。 Task queue tasks can run up to 24 hours.任务队列任务最长可以运行 24 小时。

https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed https://cloud.google.com/appengine/docs/standard/python/how-instances-are-managed

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Google Cloud Dataflow / Apache Beam中并行运行多个WriteToBigQuery? - How to run multiple WriteToBigQuery parallel in google cloud dataflow / apache beam? 使用 Apache Beam python 创建谷歌云数据流模板时出现 RuntimeValueProviderError - RuntimeValueProviderError when creating a google cloud dataflow template with Apache Beam python 是否可以在云数据流谷歌云平台中使用 apache 光束执行存储过程 MySQL Azure? - Is possible to execute Stored Procedure MySQL Azure using apache beam in cloud dataflow google cloud platform? Apache Beam / Google数据流Python流自动缩放 - Apache Beam/Google dataflow Python streaming autoscaling Dataflow中的自定义Apache Beam Python版本 - Custom Apache Beam Python version in Dataflow 通过使用Google Cloud Dataflow中的Python SDK推断架构来读取和编写avro文件 - Apache Beam - Read and write avro files by inferring schema using Python SDK in Google Cloud Dataflow - Apache Beam 什么是为Google Cloud Dataflow部署和管理Python SDK Apache Beam管道执行的便捷方法 - What is a convenient way to deploy and manage execution of a Python SDK Apache Beam pipeline for Google cloud Dataflow 在Python中从Apache Beam数据流连接到Google Cloud BigQuery时发生TypeError吗? - TypeError when connecting to Google Cloud BigQuery from Apache Beam Dataflow in Python? 使用 apache 梁/谷歌云数据流读取多行 JSON - Read multiline JSON using apache beam / google cloud dataflow 带有 Apache Beam 的 Google Cloud Dataflow 不显示日志 - Google Cloud Dataflow with Apache Beam does not display log
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM