简体   繁体   中英

Extracting BigQuery Data From a Shared Dataset

Is it possible to extract data (to google cloud storage) from a shared dataset (where I have only have view permissions) using the client APIs (python)?

I can do this manually using the web browser, but cannot get it to work using the APIs.

I have created a project (MyProject) and a service account for MyProject to use as credentials when creating the service using the API. This account has view permissions on a shared dataset (MySharedDataset) and write permissions on my google cloud storage bucket. If I attempt to run a job in my own project to extract data from the shared project:

job_data = {
        'jobReference': {
            'projectId': myProjectId,
            'jobId': str(uuid.uuid4())
        },
        'configuration': {
            'extract': {
                'sourceTable': {
                    'projectId': sharedProjectId,
                    'datasetId': sharedDatasetId,
                    'tableId': sharedTableId,
                },
                'destinationUris': [cloud_storage_path],
                'destinationFormat': 'AVRO'
            }
        }
    }

I get the error:

googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Value 'myProjectId' in content does not agree with value sharedProjectId'. This can happen when a value set through a parameter is inconsistent with a value set in the request.">

Using the sharedProjectId in both the jobReference and sourceTable I get:

googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Access Denied: Job myJobId: The user myServiceAccountEmail does not have permission to run a job in project sharedProjectId">

Using myProjectId for both the job immediately comes back with a status of 'DONE' and with no errors, but nothing has been exported. My GCS bucket is empty.

If this is indeed not possible using the API, is there another method/tool that can be used to automate the extraction of data from a shared dataset?

* UPDATE *

This works fine using the API explorer running under my GA login. In my code I use the following method:

service.jobs().insert(projectId=myProjectId, body=job_data).execute()

and removed the jobReference object containing the projectId

job_data = {
        'configuration': {
            'extract': {
                'sourceTable': {
                    'projectId': sharedProjectId,
                    'datasetId': sharedDatasetId,
                    'tableId': sharedTableId,
                },
                'destinationUris': [cloud_storage_path],
                'destinationFormat': 'AVRO'
            }
        }
    }

but this returns the error

Access Denied: Table sharedProjectId:sharedDatasetId.sharedTableId: The user 'serviceAccountEmail' does not have permission to export a table in dataset sharedProjectId:sharedDatasetId

My service account now is an owner on the shared dataset and has edit permissions on MyProject, where else do permissions need to be set or is it possible to use the python API using my GA login credentials rather than the service account?

* UPDATE *

Finally got it to work. How? Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!)

After trying to reproduce the issue, I was running into the parse errors. I did how ever play around with the API on the Developer Console [2] and it worked. jobs.insert API What I did notice is that the request code below had a different format than the documentation on the website as it has single quotes instead of double quotes.

Here is the code that I ran to get it to work.

{
'configuration': {
    'extract': {
        'sourceTable': {
            'projectId': "sharedProjectID",
            'datasetId': "sharedDataSetID",
            'tableId': "sharedTableID"
        },
        'destinationUri': "gs://myBucket/myFile.csv"
    }
}
}

HTTP Request

POST https://www.googleapis.com/bigquery/v2/projects/myProjectId/jobs

If you are still running into problems, you can try the you can try the jobs.insert API on the website [2] or try the bq command tool [3].

The following command can do the same thing:

bq extract sharedProjectId:sharedDataSetId.sharedTableId gs://myBucket/myFile.csv

Hope this helps.

[2] https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert

[3] https://cloud.google.com/bigquery/bq-command-line-tool

确保服务帐户具有查看数据集的权限(如果您无权自己检查此数据集,并且有人告诉您可以这样做,请要求他们仔细检查/向您发送屏幕截图!)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM