简体   繁体   English

GCP Dataflow 计算图和作业执行

[英]GCP Dataflow Computation Graph and Job Execution

Hi Everyone I tried hard to understand what is happening when I create a custom template in Google cloud Dataflow but failed to understand.大家好,当我在谷歌云数据流中创建自定义模板时,我努力了解发生了什么,但未能理解。 Thanks to GCP documentations.感谢 GCP 文档。 Below is what I am achieving.以下是我正在实现的目标。

  1. Read Data from Google cloud Bucket从谷歌云桶中读取数据
  2. Pre-Process it预处理它
  3. Load Deeplearning models (1 GB each) and get the predictions加载深度学习模型(每个 1 GB)并获得预测
  4. Dump the results in BigQuery.将结果转储到 BigQuery 中。

I successfully created the template and I am able to execute the job.我成功创建了模板并且能够执行作业。 But I have following questions.但我有以下问题。

  1. When I execute the job, Everytime the models (5 models and each of 1GB) gets downloaded during execution OR the models are loaded and placed in the template (Execution Graph) and during execution it uses the loaded ones当我执行作业时,每次在执行期间下载模型(5 个模型,每个 1GB)加载模型并将其放置在模板(执行图)中,并在执行期间使用加载的模型
  2. If loading of the models happen only during the job execution, then does it not impact the execution time?如果仅在作业执行期间加载模型,那么它不会影响执行时间吗? Since it has to load GBs of Model files everytime the job is triggered?因为每次触发作业时它都必须加载 GB 的 Model 个文件?
  3. Can multiple users trigger the same template at same time?多个用户可以同时触发同一个模板吗? Since I want to productionize it, I am not sure how this will handle multiple requests at same time?因为我想生产它,我不确定这将如何同时处理多个请求?

Can anyone please share some information on it?任何人都可以分享一些信息吗?

Sources I referred and failed to get the answer: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#pipeline-lifecycle-from-pipeline-code-to-dataflow-job http://alumni.media.mit.edu/~wad/magiceight/isa/node3.html https://cloud.google.com/dataflow/docs/guides/setting-pipeline-options#configuring-pipelineoptions-for-local-execution https://beam.apache.org/documentation/basics/ https://beam.apache.org/documentation/runtime/model/ https://mehmandarov.com/apache-beam-pipeline-graph/我引用但未能得到答案的来源: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#pipeline-lifecycle-from-pipeline-code-to-dataflow-job http:/ /alumni.media.mit.edu/~wad/magiceight/isa/node3.html https://cloud.google.com/dataflow/docs/guides/setting-pipeline-options#configuring-pipelineoptions-for-local-execution https://beam.apache.org/documentation/basics/ https://beam.apache.org/documentation/runtime/model/ https://mehmandarov.com/apache-beam/-pipeline-graph

This depends on where the models are being loaded from.这取决于从哪里加载模型。 If they're loaded in the DoFns (most likely), then it will happen in the workers (during job execution).如果它们加载到 DoFns 中(最有可能),那么它将发生在工作人员中(在作业执行期间)。

As for your other question, there should be no issues with multiple users triggering a template job simultaneously.至于您的其他问题,多个用户同时触发模板作业应该没有问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GCP 数据流作业部署 - GCP Dataflow Job Deployment 来自作曲家错误的 gcp 触发数据流作业 - gcp trigger dataflow job from composer error GCP 数据流作业 REST 响应添加显示数据 object 与 {“key”:“datasetName”,...} - GCP Dataflow JOB REST response add displayData object with { "key":"datasetName", ...} 为什么在 GCP 的 java sdk 中编写的数据流作业管道的日志不可见? - why logs are not visible for Dataflow job pipeline written in java sdk at GCP? 作业图太大,无法提交到 Google Cloud Dataflow - Job graph too large to submit to Google Cloud Dataflow 如何将非模板化的梁作业转换为模板化作业并在 GCP Dataflow 运行器上运行? - How to convert a non-templated beam job to templated job and run it on GCP Dataflow runner? 如何运行用 Golang 编写的 GCP Cloud Function 以运行数据流作业以将文本文件导入 Spanner? - How to run a GCP Cloud Function written in Golang to run a Dataflow job to import text file to Spanner? GCP 数据流批处理作业 - 防止工作人员在批处理作业中一次运行多个元素 - GCP Dataflow Batch jobs - Preventing workers from running more than one element at a time in a batch job GCP 数据流和 On-premDB - GCP dataflow and On-premDB GCP 数据流:打印 PCollection 数据 - GCP Dataflow : print PCollection data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM