简体   繁体   English

Apache Beam数据流BigQuery

[英]Apache Beam Dataflow BigQuery

How can I get the list of tables from a Google BigQuery dataset using apache beam with DataflowRunner? 如何使用apache beam和DataflowRunner从Google BigQuery数据集中获取表列表?

I can't find how to get tables from a specified dataset. 我找不到如何从指定的数据集获取表。 I want to migrate tables from a dataset located in US to one in EU using Dataflow's parallel processing programming model. 我想使用Dataflow的并行处理编程模型将表从位于美国的数据集中迁移到位于欧盟的一个数据集中。

Declare library 声明图书馆

from google.cloud import bigquery

Prepares a bigquery client 准备一个bigquery客户

client = bigquery.Client(project='your_project_name')

Prepares a reference to the new dataset 准备对新数据集的引用

dataset_ref = client.dataset('your_data_set_name')

Make API request 发出API请求

tables = list(client.list_tables(dataset_ref))
if tables:
    for table in tables:
        print('\t{}'.format(table.table_id))

Reference: https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets 参考: https : //googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets

You can try using google-cloud-examples maven repo. 您可以尝试使用google-cloud-examples Maven存储库。 There's a class by the name BigQuerySnippets that makes a API call to get the table meta and you can fetch the the schema. 有一个名为BigQuerySnippets的类,该类进行API调用来获取表元,您可以获取模式。 Please note that the limit API quota is 6 maximum concurrent requests per second. 请注意,限制API配额是每秒6个最大并发请求。

The purpose of Dataflow is to create pipelines, so the ability to make some API requests is not included. Dataflow的目的是创建管道,因此不包括发出某些API请求的功能。 You have to use the BigQuery Java Client Library to get the data and then provide it to your Apache Pipeline. 您必须使用BigQuery Java客户端库来获取数据,然后将其提供给Apache Pipeline。

DatasetId datasetId = DatasetId.of(projectId, datasetName);
Page<Table> tables = bigquery.listTables(datasetId, TableListOption.pageSize(100));
for (Table table : tables.iterateAll()) {
  // do something
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM