简体繁体中英

Read spanner data from a table which is simultaneously being written

原文 2018-12-17 12:51:11 8 2 google-cloud-platform/ google-bigquery/ google-cloud-dataflow/ apache-beam/ google-cloud-spanner

I'm copying Spanner data to BigQuery through a Dataflow job. The job is scheduled to run every 15 minutes. The problem is, if the data is read from a Spanner table which is also being written at the same time, some of the records get missed while copying to BigQuery.

I'm using readOnlyTransaction() while reading Spanner data. Is there any other precaution that I must take while doing this activity?

2 answers

It is recommended to use Cloud Spanner commit timestamps to populate columns like update_date . Commit timestamps allow applications to determine the exact ordering of mutations.

Using commit timestamps for update_date and specifying an exact timestamp read, the Dataflow job will be able to find all existing records written/committed since the previous run.

https://cloud.google.com/spanner/docs/commit-timestamp

https://cloud.google.com/spanner/docs/timestamp-bounds

if the data is read from a Spanner table which is also being written at the same time, some of the records get missed while copying to BigQuery

This is how transactions work. They present a 'snapshot view' of the database at the time the transaction was created, so any rows written after this snapshot is taken will not be included.

As @rose-liu mentioned , using commit timestamps on your rows, and keeping track of the timestamp when you last exported (available from the ReadOnlyTransaction object) will allow you to accurately select 'new/updated rows since last export'

What is the best way to read data from Spanner to PubSub Queue?

Insert Large data into Cloud Spanner Table

GCP Dataflow - How to read the data from Google BigQuery and load into Google Spanner using Dataflow

How to delete a table from Google Cloud Spanner?

load data from hive to Spanner google db

deploying data from google cloud spanner database

spanner read with timestamp fails

Cloud Spanner DB - Unable to read timestamp from ReadOnlyTransaction.getReadTimestamp()

Google Spanner - How do you copy data to another table?

Spanner - delete one row or all rows from a table

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What is the best way to read data from Spanner to PubSub Queue? Insert Large data into Cloud Spanner Table GCP Dataflow - How to read the data from Google BigQuery and load into Google Spanner using Dataflow How to delete a table from Google Cloud Spanner? load data from hive to Spanner google db deploying data from google cloud spanner database spanner read with timestamp fails Cloud Spanner DB - Unable to read timestamp from ReadOnlyTransaction.getReadTimestamp() Google Spanner - How do you copy data to another table? Spanner - delete one row or all rows from a table

Related Tags

Read spanner data from a table which is simultaneously being written

Question

2 answers

solution1
3 ACCPTED 2018-12-18 20:39:19

solution2
0 2018-12-21 13:02:07

Read spanner data from a table which is simultaneously being written

Question

2 answers

solution1 3 ACCPTED 2018-12-18 20:39:19

solution2 0 2018-12-21 13:02:07

solution1
3 ACCPTED 2018-12-18 20:39:19

solution2
0 2018-12-21 13:02:07