We are planning to write a workflow whose purpose is to migrates data on bigquery via a transfer service by performing a copy/insert from a source_project:source_dataset.source_table_1
into a destination_project:destination_dataset.destination_table_1
.
As far as I know, data transfer would answer the following needs:
destination_table_1
and fill itdestination_table_1
if source_table_1
has rows added to it. What would happen if I delete some rows from source_table_1
, or even delete the source_table_1
? The expected behavior should be to have a perfectly synchronized data state between source_table_1
and destination_table_1
. Does data transfer service handle this case?
Here is what we would be implementing:
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
destination_project_id = "my-destination-project"
destination_dataset_id = "my_destination_dataset"
source_project_id = "my-source-project"
source_dataset_id = "my_source_dataset"
transfer_config = bigquery_datatransfer.TransferConfig(
destination_dataset_id=destination_dataset_id,
display_name="Your Dataset Copy Name",
data_source_id="cross_region_copy",
params={
"source_project_id": source_project_id,
"source_dataset_id": source_dataset_id,
},
schedule="every 24 hours",
)
transfer_config = transfer_client.create_transfer_config(
parent=transfer_client.common_project_path(destination_project_id),
transfer_config=transfer_config,
)
print(f"Created transfer config: {transfer_config.name}")
I think that the copy jobs are a better choice for this use case:
https://cloud.google.com/bigquery/docs/copying-datasets
Copy jobs will perform any operation to ensure that the destination table is completely in sync with the source table:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.