简体繁体 English

确保MySQL中的auto_increment值排序

[英]Ensure auto_increment value ordering in MySQL

原文 2016-03-01 15:48:34 8 1 mysql/ multithreading/ event-sourcing

I have multiple threads writing events into a MySQL table events . 我有多个线程将事件写入MySQL表events 。

The table has an tracking_no column configured as auto_increment used to enforce an ordering of the events. 该表有一个tracking_no列，配置为auto_increment用于强制执行事件的排序。 Different readers are consuming from events and they poll the table regularly to get the new events and keep the value of the last-consumed event to get all the new events at each poll. 不同的读者正在消耗events ，他们定期轮询表以获取新事件并保留最后消耗事件的值以获得每次轮询时的所有新事件。

It turns out that the current implementation leaves the chance of missing some events. 事实证明，当前的实现有可能遗漏一些事件。

This is what's happening: 这就是发生的事情：

Thread-1 begins an "insert" transaction, it takes the next value from auto_increment column (1) but takes a while to complete Thread-1开始一个“插入”事务，它从auto_increment列（1）获取下一个值，但需要一段时间才能完成
Thread-2 begins an "insert" transaction, it takes the next auto_incremente value (2) and completes the write before Thread-1 . Thread-2开始“插入”事务，它接受下一个auto_incremente值（2）并在Thread-1之前完成写入。
Reader polls and asks for all events with tracking_number greater than 0; Reader轮询并询问tracking_number大于0的所有事件; it gets event 2 because Thread-1 is still lagging behind. 它得到了事件2，因为Thread-1仍然落后。 The events gets consumed and Reader updates it's tracking status to 2. 事件被消耗， Reader其跟踪状态更新为2。
Thread-1 completes the insert, event 1 appears in the table. Thread-1完成插入，事件1出现在表中。
Reader polls again for all events after 2, and while event 1 was inserted it will never be picked up again. Reader在2之后再次轮询所有事件，并且在插入事件1时，它将永远不会被再次拾取。

It seems this could be solved by changing the auto_increment strategy to lock the entire table until a transaction completes, but if possible we would avoid it. 似乎可以通过更改auto_increment策略来锁定整个表直到事务完成，但是如果可能的话我们会避免它。

1 个解决方案

I can think of two possible approaches. 我可以想到两种可能的方法。

1) If your event inserts are guaranteed to succeed (ie, you never roll back an event insert, and therefore there are never any persistent gaps in your tracking_no), then you can rewrite your Readers so that they keep track of the last contiguous event seen -- aka the last event successfully processed. 1）如果您的事件插入保证成功（即，您永远不会回滚事件插入，因此您的tracking_no中永远不会有任何持续的间隙），那么您可以重写您的读者，以便他们跟踪最后一个连续事件看到 - 也就是成功处理的最后一个事件。

The reader queries the event store, starts processing the events in order, and then stops if a gap is found. 读者查询事件存储，按顺序开始处理事件，然后在找到间隙时停止。 The remaining events are discarded. 其余事件将被丢弃。 The next query uses the sequence number of the last successfully processed event. 下一个查询使用上次成功处理的事件的序列号。

Rollback makes a mess of this, though - scenarios with concurrent writes can leave persistent gaps in the stream, which would cause your readers to block. 尽管如此，回滚会弄乱这一点 - 并发写入的情况会在流中留下持久的空白，这会导致读者阻塞。

2) You could rewrite your query with a maximum event represented in time. 2）您可以使用及时表示的最大事件重写您的查询。 See MySQL create time and update time timestamp for the mechanics of setting up timestamp columns. 有关设置时间戳列的机制，请参阅MySQL创建时间和更新时间戳。

The idea then is that your readers query for all events with a higher sequence number than the last successfully processed event, but with a timestamp less than now() - some reasonable SLA interval. 接下来的想法是，您的读者查询序列号比上次成功处理的事件更高的所有事件，但时间戳小于now（） - 一些合理的SLA间隔。

It generally doesn't matter if the projections of an event stream are a little bit behind in time. 如果事件流的投影在时间上略微落后，则通常无关紧要。 So you leverage this, reading events in the past, which protects you from writes in the present that haven't completed yet. 因此，您可以利用此功能，阅读过去的事件，从而保护您免受当前尚未完成的写入。

That doesn't work for the domain model, though -- if you are loading an event stream to prepare for a write, working from a stream that is a measurable interval in the past isn't going to be much fun. 但是，这对域模型不起作用 - 如果您要加载一个事件流来准备写入，那么从过去的可测量间隔流开始工作并不会有太多乐趣。 The good news is that the writers know which version of the object they are currently working on, and therefore where in the sequence their generated events belong. 好消息是作者知道他们当前正在处理的对象的版本，因此他们生成的事件所属的序列在哪里。 So you track the version in the schema, and use that for conflict detection. 因此，您可以跟踪架构中的版本，并将其用于冲突检测。

Note It's not entirely clear to me that the sequence numbers should be used for ordering. 注意我并不完全清楚序列号应该用于排序。 See https://stackoverflow.com/a/9985219/54734 请参阅https://stackoverflow.com/a/9985219/54734

Synthetic keys (IDs) are meaningless anyway. 无论如何，合成密钥（ID）毫无意义。 Their order is not significant, their only property of significance is uniqueness. 他们的命令并不重要，他们唯一的重要属性是独特性。 You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. 你不能有意义地衡量两个ID是如何“相距甚远”的，也不能有意义地说一个是大于还是小于另一个。

So this may be a case of having the wrong problem. 所以这可能是一个错误的问题。