简体   繁体   中英

How do large scale databases handle locks?

I have a question about databases and updating rows.

I am currently running a flask application, and the endpoint runs a command like this (Please accept this pseudocode / shorthand syntax)

select * from Accounts where used = "False"
username = (first/random row of returned set)
update Accounts set used = "True" where name = username
return username

However, what if 100 people run a call to this endpoint at the Same Time?

How can I avoid colisions? (Meaning 2 people dont get the same username from the table, as the update statement hasnt run yet before the 2nd person queries).

The obvious solution is a lock, something like -- this way if both people hit the endpoint at the exact same time, the 2nd person will have to wait for the lock to release

Global lock 

----

with lock:
    select * from Accounts where used = "False"
    username = (first/random row of returned set)
    update Accounts set used = "True" where name = username
    return username

I believe this would work, but it wouldnt be a great solution. Does anyone have any better ideas for this? I'm sure companies have this issue all the time with data consistency, how do they solve it?

Thanks!

MySQL / InnoDB offers four transaction isolation levels : READ UNCOMMITTED , READ COMMITTED , REPEATABLE READ , and SERIALIZABLE .

Assuming you perform all commands in a single transaction, with REPEATABLE READ and SERIALIZABLE isolation levels, only one transaction accessing the same rows would be executed at a time, so in the case of 100 users, only 1 user would be executing the transaction while the remaining 99 would be waiting in queue.

With READ UNCOMMITTED and READ COMMITTED isolation levels, it would be possible for two or more users to read the same row when it was used = False and try to set it to used = True .

I think it would be better if you refactored your database layout into two tables: one with all possible names, and the other with used names, with a unique constraint on the name column. For every new user, you would insert a new row into the used names table. If you tried to insert multiple users with the same name, you would get a unique constraint violated error, and would be able to try again with a different name.

Global locks on a database are a VERY bad thing. They will slow everything down immensely. Instead there are table locks (to be avoided), row locks (these are fine), and transactions.

Use a transaction . This serves to isolate your changes from others and theirs from yours. It also allows you to throw away all the changes, rollback, if there's a problem so you don't leave a change halfway done. Unless you have a very good reason otherwise, you should ALWAYS be in a transaction.

MySQL supports SELECT FOR UPDATE which tells the database that you're going to update the selected rows in this transaction so those rows get locked.

To use your pseudo-code example...

begin transaction
select * from Accounts where used = "False" for update
username = (first/random row of returned set)
update Accounts set used = "True" where name = username
commit
return username

Basically, transactions make a set of SQL statements "atomic" meaning they happen in a single operation from the point-of-view of concurrent use.


Other points... you should update with the primary key of Accounts to avoid the possibility of using a non-unique field. Maybe the primary key is username, maybe it isn't.

Second, a select without an order by can return in any order it wants. If you're working through a queue of accounts, you should probably specify some sort of order to ensure the oldest ones get done first (or whatever you decide your business logic will be). Even order by rand() will do a better job than relying on the default table ordering.

Finally, if you're only going to fetch one row, add a limit 1 so the database doesn't do a bunch of extra work.

First of all, I would add a new field to the table, let's call it session id. Each client that connecrs to that endpoint should have a unique session id, sg that sets it apart from the other clients.

Instead of doing a select, then an update, I would first update a single record and set its session id field to the client's session id, then retrieve the record bssed on the session id:

update Accounts
set used = "True", sessionid=...
where used="false" and sessionid is null limit 1;

select name from accounts where sessionid=...;

This way you avoid the need of locking

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM