简体   繁体   中英

Ensure auto_increment value ordering in MySQL

I have multiple threads writing events into a MySQL table events .

The table has an tracking_no column configured as auto_increment used to enforce an ordering of the events. Different readers are consuming from events and they poll the table regularly to get the new events and keep the value of the last-consumed event to get all the new events at each poll.

It turns out that the current implementation leaves the chance of missing some events.

This is what's happening:

  • Thread-1 begins an "insert" transaction, it takes the next value from auto_increment column (1) but takes a while to complete
  • Thread-2 begins an "insert" transaction, it takes the next auto_incremente value (2) and completes the write before Thread-1 .
  • Reader polls and asks for all events with tracking_number greater than 0; it gets event 2 because Thread-1 is still lagging behind. The events gets consumed and Reader updates it's tracking status to 2.
  • Thread-1 completes the insert, event 1 appears in the table.
  • Reader polls again for all events after 2, and while event 1 was inserted it will never be picked up again.

It seems this could be solved by changing the auto_increment strategy to lock the entire table until a transaction completes, but if possible we would avoid it.

I can think of two possible approaches.

1) If your event inserts are guaranteed to succeed (ie, you never roll back an event insert, and therefore there are never any persistent gaps in your tracking_no), then you can rewrite your Readers so that they keep track of the last contiguous event seen -- aka the last event successfully processed.

The reader queries the event store, starts processing the events in order, and then stops if a gap is found. The remaining events are discarded. The next query uses the sequence number of the last successfully processed event.

Rollback makes a mess of this, though - scenarios with concurrent writes can leave persistent gaps in the stream, which would cause your readers to block.

2) You could rewrite your query with a maximum event represented in time. See MySQL create time and update time timestamp for the mechanics of setting up timestamp columns.

The idea then is that your readers query for all events with a higher sequence number than the last successfully processed event, but with a timestamp less than now() - some reasonable SLA interval.

It generally doesn't matter if the projections of an event stream are a little bit behind in time. So you leverage this, reading events in the past, which protects you from writes in the present that haven't completed yet.

That doesn't work for the domain model, though -- if you are loading an event stream to prepare for a write, working from a stream that is a measurable interval in the past isn't going to be much fun. The good news is that the writers know which version of the object they are currently working on, and therefore where in the sequence their generated events belong. So you track the version in the schema, and use that for conflict detection.

Note It's not entirely clear to me that the sequence numbers should be used for ordering. See https://stackoverflow.com/a/9985219/54734

Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another.

So this may be a case of having the wrong problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM