简体   繁体   中英

Relational Database locking

While job interview I've been asking the following question: There is the following situation About 10 threads take information from queue (put as Message Objects) parse it and pass it to the same database table. While the process of passing data to the database is complicated and contains number of actions including inserts, deletes and updates, the whole process is atomic that is confined by transaction open and transaction closed actions. When the queue started to get duplicated messages the DB started to face a lot of problems such as locks, deadlocks, rollbacks and etc Pseudo code illustration:

function void doWork(Message msg){
  int msgId = msg.getId();
  Timestamp msgTime = msg.getTime();
  Data data = msg.getData();
  OpenTransaction;
     // manipulate db 
     parseIntoDB(msgId, msgTime,data);
  CloseTransaction; 
}

Worthwhile to notify that the table doesn't have constraints. I've asked 2 questions:

  1. Why it (locks) occurs only with duplicated data?

  2. How quickly to solve the problem without considering with performance?

After a while when I didn't managed to explain why locking occurs she said it happens due to locks on the same rows. So I supposed that we should perform some synchronization into the function while working with database, and therefore to put synchronized block on the parseIntoDB

function void doWork(Message msg){
  int msgId = msg.getId();
  Timestamp msgTime = msg.getTime();
  Data data = msg.getData();
  OpenTransaction;
     synchronized(someObject){
        // manipulate db 
        parseIntoDB(msgId, msgTime,data);
     }
  CloseTransaction; 
}

and according to her response I was in right direction but what I still don't know is whether to lock only the function parseIntoDB or also transaction activities, and second monitoring should be done on which object?

The question tests knowledge of theoretical transaction isolation levels and possibly implementation details like MVCC (Oracle) vs. row and page locking (MSSQL).

I wont go into details here as it is the topic of books, but I suspect the interviewer wants to her about access serialization, idempotent actions, work minimizing, scaleability, consistency models, maybe optimistic locking or A/B deadlocks.

We can't really answer this question without knowing details of how transactions are implemented and what the processing does. Also we must know what we are allowed and/or required to do with duplicate messages: must we process them, or may we not process them, or must we only partially process them, or must we not process them? Also for concurrent processing to be sound we must address many aspects of the state of a system. But we can talk in generalities.

I will assume that a msgId identifies a message.

When there are no duplicate messages then every transaction is on a different msgId. Since there were no problems before but there are now we can expect that the DBMS has been set up to allow transactions on different msgIds to proceed concurrently.

But with multiple copies of a messages arriving (even if they're not in the queue at the same time) then multiple threads can be simultaneously trying to affect whether overlapping sets of rows are the database. This leads to the problems.

Roughly speaking our problems will not occur if ther are no duplicates among the msgIds being used by the threads. So a solution is a manager of locks per msgId. A thread gets a msgId from the queue. It tries to aquire a lock on that msgId. If it fails then per policy either it throws the message away and asks for another or it waits. When the thread is done with a msgId it unlocks it. Waiting must be according to some protocol enforcing liveness implemented by the lock manager.

However a better solution is for threads to simply see a queue that gives them a msgId but where they also tell the queue when they are done with that msId. Such a queue is implemented using such a lock manager and the old queue.

I repeat that this is coarse and general. Sound concurrent programming is complex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM