简体   繁体   中英

BigQueryIO read get TableSchema

What I want to do is read an existing table and generate a new table which has the same schema as the original table plus a few extra column (computed from some columns of the original table). The original table schema can be increased without notice to me (the fields I am using in my dataflow job won't change), so I would like to always read the schema instead of defining some custom class which contains the schema.

In Dataflow SDK 1.x, I can get the TableSchema via

final DataflowPipelineOptions options = ...
final String projectId = ...
final String dataset = ...
final String table = ...

final TableSchema schema = new BigQueryServicesImpl()
    .getDatasetService(options)
    .getTable(projectId, dataset, table)
    .getSchema();

For Dataflow SDK 2.x, BigQueryServicesImpl has become a package-private class.

I read the responses in Get TableSchema from BigQuery result PCollection<TableRow> but I'd prefer not to make a separate query to BigQuery. As that response is now almost 2 years old, are there other thoughts or ideas from the SO community?

Due to how BigQueryI/O is setup now. It needs to query the table schema before the pipleine begins to run. This is a good feature idea, but its not feasible in a single pipeline. In the example you linked the table schema is queries before running the pipeline.

If new columns are added, then unfortunately a new pipeline must be relaunched.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM