Wordcount Example on GCP Stuck

Question

I followed the examples on https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-go for both Python as well as Go but when I deploy the job to Dataflow, the job doesn't progress past 0% for >20mins.

Is there any known issues for Dataflow that prevent completion of this job?

The options I used to execute the job:

python -m  apache_beam.examples.wordcount \
             --input gs://dataflow-samples/shakespeare/kinglear.txt \
            --output <output_bucket> \
            --runner DataflowRunner \
            --project <project_id>  \
            --region us-west1 \
            --tmp_location <gcp_tmp_bucket> \
            --service_account_email=<service_account> \
            --subnetwork=<subnetwork_path>

Answer 1

Your job is stagnating because you haven't filled in the values in the example command 😄

Cancel the job, the job should timeout if it doesn't detect anything happening but you are billed for the worker that's running while it's stuck.

You need to create a GCS bucket: this is passed to --output "gs://yourbucket/output"
You need to specify your current project in GCP --project your_project
Change the region if you are not working out of us-west1 in --region
You can specify a subpath of the bucket you created earlier for tmp_location : --tmp_location "gs://yourbucket/tmp"
A service account is optional - leave this out and it will use the default Compute Engine service account.
A subnetwork is optional as well, leave it out and Dataflow will use the default subnetwork (and each worker will have a public IP).

Fill these options in the command and re-run.

Wordcount Example on GCP Stuck

Question

1 answers

solution1
0 2022-06-11 02:12:59

Wordcount Example on GCP Stuck

Question

1 answers

solution1 0 2022-06-11 02:12:59

solution1
0 2022-06-11 02:12:59