简体   繁体   中英

How to synchronize data between two bigquery tables

We are planning to write a workflow whose purpose is to migrates data on bigquery via a transfer service by performing a copy/insert from a source_project:source_dataset.source_table_1 into a destination_project:destination_dataset.destination_table_1 .

As far as I know, data transfer would answer the following needs:

  • Create destination_table_1 and fill it
  • Update via a schedule destination_table_1 if source_table_1 has rows added to it.

What would happen if I delete some rows from source_table_1 , or even delete the source_table_1 ? The expected behavior should be to have a perfectly synchronized data state between source_table_1 and destination_table_1 . Does data transfer service handle this case?

Here is what we would be implementing:

    transfer_client = bigquery_datatransfer.DataTransferServiceClient()
    destination_project_id = "my-destination-project"
    destination_dataset_id = "my_destination_dataset"
    source_project_id = "my-source-project"
    source_dataset_id = "my_source_dataset"
    transfer_config = bigquery_datatransfer.TransferConfig(
        destination_dataset_id=destination_dataset_id,
        display_name="Your Dataset Copy Name",
        data_source_id="cross_region_copy",
        params={
            "source_project_id": source_project_id,
            "source_dataset_id": source_dataset_id,
        },
        schedule="every 24 hours",
    )
    transfer_config = transfer_client.create_transfer_config(
        parent=transfer_client.common_project_path(destination_project_id),
        transfer_config=transfer_config,
    )
    print(f"Created transfer config: {transfer_config.name}")

I think that the copy jobs are a better choice for this use case:

https://cloud.google.com/bigquery/docs/copying-datasets

Copy jobs will perform any operation to ensure that the destination table is completely in sync with the source table:

https://cloud.google.com/bigquery/quotas#copy_jobs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM