简体   繁体   中英

Read spanner data from a table which is simultaneously being written

I'm copying Spanner data to BigQuery through a Dataflow job. The job is scheduled to run every 15 minutes. The problem is, if the data is read from a Spanner table which is also being written at the same time, some of the records get missed while copying to BigQuery.

I'm using readOnlyTransaction() while reading Spanner data. Is there any other precaution that I must take while doing this activity?

It is recommended to use Cloud Spanner commit timestamps to populate columns like update_date . Commit timestamps allow applications to determine the exact ordering of mutations.

Using commit timestamps for update_date and specifying an exact timestamp read, the Dataflow job will be able to find all existing records written/committed since the previous run.

https://cloud.google.com/spanner/docs/commit-timestamp

https://cloud.google.com/spanner/docs/timestamp-bounds

if the data is read from a Spanner table which is also being written at the same time, some of the records get missed while copying to BigQuery

This is how transactions work. They present a 'snapshot view' of the database at the time the transaction was created, so any rows written after this snapshot is taken will not be included.

As @rose-liu mentioned , using commit timestamps on your rows, and keeping track of the timestamp when you last exported (available from the ReadOnlyTransaction object) will allow you to accurately select 'new/updated rows since last export'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM