Apache Beam Dataflow BigQuery

Question

How can I get the list of tables from a Google BigQuery dataset using apache beam with DataflowRunner?

I can't find how to get tables from a specified dataset. I want to migrate tables from a dataset located in US to one in EU using Dataflow's parallel processing programming model.

Answer 1

Declare library

from google.cloud import bigquery

Prepares a bigquery client

client = bigquery.Client(project='your_project_name')

Prepares a reference to the new dataset

dataset_ref = client.dataset('your_data_set_name')

Make API request

tables = list(client.list_tables(dataset_ref))
if tables:
    for table in tables:
        print('\t{}'.format(table.table_id))

Reference: https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets

Answer 2

You can try using google-cloud-examples maven repo. There's a class by the name BigQuerySnippets that makes a API call to get the table meta and you can fetch the the schema. Please note that the limit API quota is 6 maximum concurrent requests per second.

Answer 3

The purpose of Dataflow is to create pipelines, so the ability to make some API requests is not included. You have to use the BigQuery Java Client Library to get the data and then provide it to your Apache Pipeline.

DatasetId datasetId = DatasetId.of(projectId, datasetName);
Page<Table> tables = bigquery.listTables(datasetId, TableListOption.pageSize(100));
for (Table table : tables.iterateAll()) {
  // do something
}

Apache Beam Dataflow BigQuery

Question

3 answers

solution1
0 2018-07-18 00:40:26

Declare library

Prepares a bigquery client

Prepares a reference to the new dataset

Make API request

solution2
0 2018-09-07 19:01:31

solution3
0 2018-11-15 21:33:01

Apache Beam Dataflow BigQuery

Question

3 answers

solution1 0 2018-07-18 00:40:26

Declare library

Prepares a bigquery client

Prepares a reference to the new dataset

Make API request

solution2 0 2018-09-07 19:01:31

solution3 0 2018-11-15 21:33:01

solution1
0 2018-07-18 00:40:26

solution2
0 2018-09-07 19:01:31

solution3
0 2018-11-15 21:33:01