简体   繁体   English

从共享数据集中提取BigQuery数据

[英]Extracting BigQuery Data From a Shared Dataset

Is it possible to extract data (to google cloud storage) from a shared dataset (where I have only have view permissions) using the client APIs (python)? 是否可以使用客户端API(python)从共享数据集(我只有查看权限)中提取数据(到Google云存储)?

I can do this manually using the web browser, but cannot get it to work using the APIs. 我可以使用网络浏览器手动完成此操作,但无法使用API​​使其正常工作。

I have created a project (MyProject) and a service account for MyProject to use as credentials when creating the service using the API. 我已经创建了一个项目(MyProject)和一个服务帐户,供MyProject使用API​​创建服务时用作凭据。 This account has view permissions on a shared dataset (MySharedDataset) and write permissions on my google cloud storage bucket. 此帐户对共享数据集(MySharedDataset)具有查看权限,并对我的Google云存储桶具有写入权限。 If I attempt to run a job in my own project to extract data from the shared project: 如果我尝试在自己的项目中运行作业以从共享项目中提取数据:

job_data = {
        'jobReference': {
            'projectId': myProjectId,
            'jobId': str(uuid.uuid4())
        },
        'configuration': {
            'extract': {
                'sourceTable': {
                    'projectId': sharedProjectId,
                    'datasetId': sharedDatasetId,
                    'tableId': sharedTableId,
                },
                'destinationUris': [cloud_storage_path],
                'destinationFormat': 'AVRO'
            }
        }
    }

I get the error: 我得到错误:

googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Value 'myProjectId' in content does not agree with value sharedProjectId'. This can happen when a value set through a parameter is inconsistent with a value set in the request."> googleapiclient.errors.HttpError:https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json返回“内容中的值“ myProjectId”与值sharedProjectId不一致。当通过参数设置的值与请求中设置的值不一致。“>

Using the sharedProjectId in both the jobReference and sourceTable I get: 在jobReference和sourceTable中都使用sharedProjectId,我得到:

googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json returned "Access Denied: Job myJobId: The user myServiceAccountEmail does not have permission to run a job in project sharedProjectId"> googleapiclient.errors.HttpError:https://www.googleapis.com/bigquery/v2/projects/sharedProjectId/jobs?alt=json返回“拒绝访问:作业myJobId:用户myServiceAccountEmail没有权限在项目中运行作业sharedProjectId“>

Using myProjectId for both the job immediately comes back with a status of 'DONE' and with no errors, but nothing has been exported. 将myProjectId用于这两个作业将立即返回状态为“ DONE”且没有错误,但是没有导出任何内容。 My GCS bucket is empty. 我的GCS存储桶是空的。

If this is indeed not possible using the API, is there another method/tool that can be used to automate the extraction of data from a shared dataset? 如果使用API​​确实无法做到这一点,是否还有另一种方法/工具可用于自动从共享数据集中提取数据?

* UPDATE * *更新*

This works fine using the API explorer running under my GA login. 使用在我的Google Analytics(分析)登录名下运行的API资源管理器,这可以很好地工作。 In my code I use the following method: 在我的代码中,我使用以下方法:

service.jobs().insert(projectId=myProjectId, body=job_data).execute()

and removed the jobReference object containing the projectId 并删除了包含projectId的jobReference对象

job_data = {
        'configuration': {
            'extract': {
                'sourceTable': {
                    'projectId': sharedProjectId,
                    'datasetId': sharedDatasetId,
                    'tableId': sharedTableId,
                },
                'destinationUris': [cloud_storage_path],
                'destinationFormat': 'AVRO'
            }
        }
    }

but this returns the error 但这会返回错误

Access Denied: Table sharedProjectId:sharedDatasetId.sharedTableId: The user 'serviceAccountEmail' does not have permission to export a table in dataset sharedProjectId:sharedDatasetId 拒绝访问:表sharedProjectId:sharedDatasetId.sharedTableId:用户'serviceAccountEmail'无权导出数据集sharedProjectId:sharedDatasetId中的表

My service account now is an owner on the shared dataset and has edit permissions on MyProject, where else do permissions need to be set or is it possible to use the python API using my GA login credentials rather than the service account? 我的服务帐户现在是共享数据集的所有者,并且对MyProject拥有编辑权限,还需要在何处设置权限,或者是否可以使用我的GA登录凭据而不是服务帐户来使用python API?

* UPDATE * *更新*

Finally got it to work. 终于让它工作了。 How? 怎么样? Make sure the service account has permissions to view the dataset (and if you don't have access to check this yourself and someone tells you that it does, ask them to double check/send you a screenshot!) 确保服务帐户具有查看数据集的权限(如果您无权自己检查此数据集,并且有人告诉您可以这样做,请要求他们仔细检查/向您发送屏幕截图!)

After trying to reproduce the issue, I was running into the parse errors. 尝试重现该问题后,我遇到了解析错误。 I did how ever play around with the API on the Developer Console [2] and it worked. 我做了过开发者控制台[2]上的API的试用,并且可以正常工作。 jobs.insert API What I did notice is that the request code below had a different format than the documentation on the website as it has single quotes instead of double quotes. 我注意到的是,下面的请求代码与网站上的文档格式不同,因为它具有单引号而不是双引号。

Here is the code that I ran to get it to work. 这是我运行以使其工作的代码。

{
'configuration': {
    'extract': {
        'sourceTable': {
            'projectId': "sharedProjectID",
            'datasetId': "sharedDataSetID",
            'tableId': "sharedTableID"
        },
        'destinationUri': "gs://myBucket/myFile.csv"
    }
}
}

HTTP Request HTTP请求

POST https://www.googleapis.com/bigquery/v2/projects/myProjectId/jobs POST https://www.googleapis.com/bigquery/v2/projects/myProjectId/jobs

If you are still running into problems, you can try the you can try the jobs.insert API on the website [2] or try the bq command tool [3]. 如果仍然遇到问题,可以尝试在网站[2]上尝试jobs.insert API或尝试bq命令工具[3]。

The following command can do the same thing: 以下命令可以执行相同的操作:

bq extract sharedProjectId:sharedDataSetId.sharedTableId gs://myBucket/myFile.csv bq提取sharedProjectId:sharedDataSetId.sharedTableId gs://myBucket/myFile.csv

Hope this helps. 希望这可以帮助。

[2] https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert [2] https://cloud.google.com/bigquery/docs/reference/v2/jobs/insert

[3] https://cloud.google.com/bigquery/bq-command-line-tool [3] https://cloud.google.com/bigquery/bq-command-line-tool

确保服务帐户具有查看数据集的权限(如果您无权自己检查此数据集,并且有人告诉您可以这样做,请要求他们仔细检查/向您发送屏幕截图!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM