简体   繁体   中英

Sequence Generator / Auto Increment using Cassandra 3.0

I read a lot of Cassandras documentation and checked the Counter changes and alike. But it seams that Cassandra does not ship with a default and standard way to generate incremental sequences on the fly.

All I found is using the IF statement / clause by doing a compare and set.

This way one can check for the existence of a document and if not present generate one. Since this is done by a quorum algorithm regarded a cluster it should be easy to use and safe but with a high latency.

To circumvent this latency one can generate (reserve) a thousand IDs by incrementing the nextSequenceId by thousand instead of one. This way one pays for the latency only once the first of the thousand is generated (or if its done prematurely it will have almost no latency at all).

I understand that doing so will create a hot-spot or a congestion.

One way to avoid this congestion is using more sequence number generators all going by a different offset (modulo) and limiting the chance of collision by randomly picking a certain sequence generator by choosing the modulo.

So this will be my naive implementation.

Since Cassandra 3.0 hit the street, I just wonder three things:

  1. Does Cassandra offer a smarter way of implementing sequences.
  2. Does Cassandra offer something to ease the pain of implementing this? I mean I do a read and than I do compare and set. Is there something more smarter?
  3. Does any library exist already giving me a kind of sequence numbers?

Jonathan has opened a Jira for this topic - https://issues.apache.org/jira/browse/CASSANDRA-9200

3.0 isn't out yet, but it appears the committers are finalizing the features for 3.0, and 9200 seems to be set for 3.1 (which really means "sometime after 3.0" - maybe 3.1, maybe 3.2, maybe 4.0).

For your questions:

1) No, there is no built in way to do sequencing in cassandra at this time

2) No, you're going to have to do a read-before-write, or block out sections of the sequence per node if you can tolerate sequences that aren't strictly increasing

3) Twitter published Snowflake at one point ( https://github.com/twitter/snowflake ), but it's now retired. Generally, I tend to use type 1 UUIDs, which are timestamp based with random components. Even UUIDs aren't foolproof, but for our workloads they tend to be 'good enough'. Simpleflake ( http://engineering.custommade.com/simpleflake-distributed-id-generation-for-the-lazy/ ) discusses the tradeoffs at the link I provided, and also offers their own generator.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM