简体   繁体   中英

Why use a centralized timestamp oracle in MVCC distributed databases?

Some MVCC based distributed databases (eg, TiDB) use a centralized timestamp oracle to distribute monotonic increasing timestamp for transactions (eg, transaction begin timestamp or commit timestamp). Then transactions read data according to their begin timestamp (ie, they can only read the data version that was committed before they begin).

Centralized timestamp oracle will affect the scalability (common opinion), and it also increases the latency of transactions because all of them must connect with remote timestamp oracle first to get timestamp. For example, there are three database nodes A, B, C and a timestamp oracle T in a cluster. When A execute a transaction, it must ask T for begin/commit timestamp.

So my question is why we use the centrilized timestamp oracle rather than physical timestamp in the local machine? Review the above example, can A just use the time of local clock as the timestamp of its transactions? If the answer is no, why can't we do that?

I realized that I had this problem because I failed to understand the relationship between timestamp, isolation level of transactions, and consistency in distributed system.

The answer to my question is that using physical clock will violate the consistency of distributed system but will not affect the transaction isolation provided by concurrency control. Take the following two transactions executed in different nodes as an example.

Tx-A: W(X{0->1}) W(Y{0->1})

Tx-B: R(X=0) R(Y=0)

In terms of real physical time, Tx-B is executed after the Tx-A is committed. However, because Tx-B is executed in a node with slow clock, the local physical clock in that node gave it a timestamp eariler than Tx-A's timestamp. Therefore, Tx-B could not read the write of Tx-A (determined by the mvcc based concurrency control).

This fact violate the consistency of distributed system (eg, linearizability), but it does not violate the isolation of transaction (eg, serializable) since both [Tx-A -> Tx-B] and [Tx-B -> Tx-A] are serial execution.

Jepsen explain the relationship between these two consistency well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM