简体   繁体   English

GCP Stuck 上的字数统计示例

[英]Wordcount Example on GCP Stuck

I followed the examples on https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go for both Python as well as Go but when I deploy the job to Dataflow, the job doesn't progress past 0% for >20mins.我对 Python 和 Go 都遵循了https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go上的示例,但是当我将作业部署到 Dataflow 时,作业不会超过 0 % > 20 分钟。

Is there any known issues for Dataflow that prevent completion of this job? Dataflow 是否有任何已知问题会阻止此作业的完成?

The options I used to execute the job:我用来执行作业的选项:

python -m  apache_beam.examples.wordcount \
             --input gs://dataflow-samples/shakespeare/kinglear.txt \
            --output <output_bucket> \
            --runner DataflowRunner \
            --project <project_id>  \
            --region us-west1 \
            --tmp_location <gcp_tmp_bucket> \
            --service_account_email=<service_account> \
            --subnetwork=<subnetwork_path>

Your job is stagnating because you haven't filled in the values in the example command 😄您的工作停滞不前,因为您没有填写示例命令中的值😄

Cancel the job, the job should timeout if it doesn't detect anything happening but you are billed for the worker that's running while it's stuck.取消作业,如果作业没有检测到任何事情发生,则作业应该超时,但是您需要为在卡住时正在运行的工作人员付费。

  • You need to create a GCS bucket: this is passed to --output "gs://yourbucket/output"您需要创建一个 GCS 存储桶:这将传递给--output "gs://yourbucket/output"
  • You need to specify your current project in GCP --project your_project您需要在 GCP --project your_project中指定您当前的项目
  • Change the region if you are not working out of us-west1 in --region如果您不在 --region 中的us-west1 --region ,请更改区域
  • You can specify a subpath of the bucket you created earlier for tmp_location : --tmp_location "gs://yourbucket/tmp"您可以为tmp_location指定之前创建的存储桶的子路径:-- --tmp_location "gs://yourbucket/tmp"
  • A service account is optional - leave this out and it will use the default Compute Engine service account.服务帐户是可选的 - 将其省略,它将使用默认的 Compute Engine 服务帐户。
  • A subnetwork is optional as well, leave it out and Dataflow will use the default subnetwork (and each worker will have a public IP).子网也是可选的,将其省略,Dataflow 将使用默认子网(每个工作人员都将拥有一个公共 IP)。

Fill these options in the command and re-run.在命令中填写这些选项并重新运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM