简体   繁体   中英

Wordcount Example on GCP Stuck

I followed the examples on https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go for both Python as well as Go but when I deploy the job to Dataflow, the job doesn't progress past 0% for >20mins.

Is there any known issues for Dataflow that prevent completion of this job?

The options I used to execute the job:

python -m  apache_beam.examples.wordcount \
             --input gs://dataflow-samples/shakespeare/kinglear.txt \
            --output <output_bucket> \
            --runner DataflowRunner \
            --project <project_id>  \
            --region us-west1 \
            --tmp_location <gcp_tmp_bucket> \
            --service_account_email=<service_account> \
            --subnetwork=<subnetwork_path>

Your job is stagnating because you haven't filled in the values in the example command 😄

Cancel the job, the job should timeout if it doesn't detect anything happening but you are billed for the worker that's running while it's stuck.

  • You need to create a GCS bucket: this is passed to --output "gs://yourbucket/output"
  • You need to specify your current project in GCP --project your_project
  • Change the region if you are not working out of us-west1 in --region
  • You can specify a subpath of the bucket you created earlier for tmp_location : --tmp_location "gs://yourbucket/tmp"
  • A service account is optional - leave this out and it will use the default Compute Engine service account.
  • A subnetwork is optional as well, leave it out and Dataflow will use the default subnetwork (and each worker will have a public IP).

Fill these options in the command and re-run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM