简体   繁体   中英

DataFlow worker BigQuery permission error

I have been trying to execute a Dataflow pipeline(Python) in my project where my GCP account is assigned with "Owner" role.

Pipeline performs below tasks.

  1. Read data from BigQuery (same project where Dataflow pipeline is running).
  2. Apply some transformations
  3. Finally load the resultant data to GCS

As per my understanding Dataflow workers uses default compute engine service account(-compute@developer.gserviceaccount.com) to access other services on GCP including BigQuery and -compute@developer.gserviceaccount.com has "Editor" role.

But when I am trying to run pipeline using DataflowRunner getiing below error.

Error:

BigQuery execution failed., Error: Message: Access Denied: Project: User does not have bigquery.jobs.create permission in project. HTTP Code: 403

This is running fine with DirectRunner.

I also tried to run this pipeline by assigning DataFlow worker, Dataflow Admin roles to
-compute@developer.gserviceaccount.com despite this has "Editor" role. But this pipeline failing with the same error.

Could you please help with your inputs to resolve this issue?

Execution command:

python -m bigquery_to_gcs --input gs://<GCS_path>/input --output gs://<GCS_path>/results/output.txt --project --region us-central1 --staging_location gs://<GCS_path>/staging --temp_location gs://<GCS_path>/tmp --runner DataflowRunner

As stated in part of Dataflow security and permissions doc , you have two accounts to be set up with proper access roles for BigQuery. In your case with BigQuery Job User or BigQuery User role for the bigquery.jobs.create permission.

The two accounts are:

  • The Google Cloud account that you use to run the Dataflow job.
  • The worker service account that runs the Dataflow job.

For the The worker service account , you're OK since you are using -compute@developer.gserviceaccount.com with Editor role since it has already the bigquery.jobs.create permission.

The Google Cloud account that you use to run the Dataflow job , is the one you need to fix and set up properly with proper access roles for BigQuery .

How you got this account to be used? It's one of these methods:

  • you run gcloud auth application-default login and it's a nominal user like Anjan.B@gmail.com
  • when you run your python -m command, you were redirected to a web flow to choose a nominal user like Anjan.B@gmail.com
  • you run export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service/account/key.json and it's a service account

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM