简体繁体 English

使用Cassandra 3.0进行序列生成器/自动增量

[英]Sequence Generator / Auto Increment using Cassandra 3.0

原文 2015-04-26 16:45:44 1 1 java/ cassandra/ auto-increment/ sequence-generators

I read a lot of Cassandras documentation and checked the Counter changes and alike. 我阅读了很多Cassandras文档并检查了Counter的变化等。 But it seams that Cassandra does not ship with a default and standard way to generate incremental sequences on the fly. 但它的接缝是Cassandra没有提供默认和标准的方式来动态生成增量序列。

All I found is using the IF statement / clause by doing a compare and set. 我发现只是通过比较和设置来使用IF语句/子句。

This way one can check for the existence of a document and if not present generate one. 通过这种方式，可以检查文档是否存在，如果不存在则生成文档。 Since this is done by a quorum algorithm regarded a cluster it should be easy to use and safe but with a high latency. 由于这是通过被视为群集的仲裁算法来完成的，因此它应该易于使用且安全但具有高延迟。

To circumvent this latency one can generate (reserve) a thousand IDs by incrementing the nextSequenceId by thousand instead of one. 为了避免这种延迟，可以通过将nextSequenceId递增千而不是1来生成（保留）一千个ID。 This way one pays for the latency only once the first of the thousand is generated (or if its done prematurely it will have almost no latency at all). 这种方式只有在生成千位中的第一个时才支付延迟（或者如果它过早地完成它将几乎没有延迟）。

I understand that doing so will create a hot-spot or a congestion. 我知道这样做会造成热点或拥堵。

One way to avoid this congestion is using more sequence number generators all going by a different offset (modulo) and limiting the chance of collision by randomly picking a certain sequence generator by choosing the modulo. 避免这种拥塞的一种方法是使用更多的序列号生成器，所有序列号生成器都通过不同的偏移量（模数）并通过选择模来随机选择某个序列生成器来限制冲突的可能性。

So this will be my naive implementation. 所以这将是我天真的实施。

Since Cassandra 3.0 hit the street, I just wonder three things: 自从Cassandra 3.0上街以来，我只想知道三件事：

Does Cassandra offer a smarter way of implementing sequences. Cassandra是否提供了一种更智能的序列实现方式。
Does Cassandra offer something to ease the pain of implementing this? Cassandra是否提供了一些缓解实现这一目标的痛苦？ I mean I do a read and than I do compare and set. 我的意思是我做了一个阅读，而不是比较和设置。 Is there something more smarter? 还有更聪明的东西吗？
Does any library exist already giving me a kind of sequence numbers? 是否存在任何库已经给我一种序列号？

1 个解决方案

Jonathan has opened a Jira for this topic - https://issues.apache.org/jira/browse/CASSANDRA-9200 Jonathan为这个主题开了一个Jira - https://issues.apache.org/jira/browse/CASSANDRA-9200

3.0 isn't out yet, but it appears the committers are finalizing the features for 3.0, and 9200 seems to be set for 3.1 (which really means "sometime after 3.0" - maybe 3.1, maybe 3.2, maybe 4.0). 3.0还没有出来，但似乎提交者正在最终确定3.0的功能，并且9200似乎设置为3.1（这实际上意味着“3.0之后的某个时间” - 可能是3.1，也许是3.2，可能是4.0）。

For your questions: 对于你的问题：

1) No, there is no built in way to do sequencing in cassandra at this time 1）不，此时没有内置的方法在cassandra中进行排序

2) No, you're going to have to do a read-before-write, or block out sections of the sequence per node if you can tolerate sequences that aren't strictly increasing 2）不，如果您能够容忍不严格增加的序列，那么您将不得不进行先读后读或阻止每个节点的序列部分

3) Twitter published Snowflake at one point ( https://github.com/twitter/snowflake ), but it's now retired. 3）Twitter一度发布了Snowflake（ https://github.com/twitter/snowflake ），但它现在已经退役了。 Generally, I tend to use type 1 UUIDs, which are timestamp based with random components. 通常，我倾向于使用类型1 UUID，它是基于随机组件的时间戳。 Even UUIDs aren't foolproof, but for our workloads they tend to be 'good enough'. 甚至UUID都不是万无一失的，但对于我们的工作量来说，它们往往“足够好”。 Simpleflake ( http://engineering.custommade.com/simpleflake-distributed-id-generation-for-the-lazy/ ) discusses the tradeoffs at the link I provided, and also offers their own generator. Simpleflake（ http://engineering.custommade.com/simpleflake-distributed-id-generation-for-the-lazy/ ）讨论了我提供的链接的权衡，并提供了自己的生成器。