简体   繁体   English

多个工作线程和数据库同步

[英]Multiple worker threads and Database synchronization

I have multiple threads that save files on disk and put that information to DB.我有多个线程将文件保存在磁盘上并将该信息放入数据库。

On the other side of app I have multiple threads that read from DB this information and process mentioned files one at a time sorted by file_id :在应用程序的另一侧,我有多个线程从数据库中读取此信息并处理提到的文件,一次一个,按file_id排序:

SELECT * FROM files_to_process ORDER BY file_id

What I've invented is to make a PROCESSING_STATUS column which has four statuses NEW , PROCESSING , FAILED , SUCCESS .我发明的是制作一个PROCESSING_STATUS列,它有四种状态NEWPROCESSINGFAILEDSUCCESS

Every worker is supposed to read ONLY one row from DB sorted by ID with status NEW and update immediately to status PROCESSING , so the other workers won't process the same file.每个工作人员应该从数据库中读取一行,该行按 ID 排序,状态为NEW并立即更新为状态PROCESSING ,因此其他工作人员不会处理同一个文件。

But, something tells me that I might end up with some race condition.但是,有些事情告诉我,我可能会遇到一些竞争条件。

Will transactions solve this problem?交易会解决这个问题吗?

Unfortunately I can't make all operation inside transaction since processing files takes a lot of time and transaction pool will be exhausted, so I have to make two transactions in the following order.不幸的是我不能在交易中进行所有操作,因为处理文件需要很多时间并且交易池将被耗尽,所以我必须按以下顺序进行两个交易。

  1. [In Transaction] Fetch row and update to status PROCESSING [交易中]获取行并更新状态PROCESSING
  2. [No Transaction] Process File [无交易]流程文件
  3. [In Transaction] Update final state to SUCCESS or FAILED depending on result [交易中]根据结果将最终 state 更新为SUCCESSFAILED

Quite annoyingly, UPDATE does not take a LIMIT in PostgreSQL.非常烦人的是,UPDATE 在 PostgreSQL 中不使用 LIMIT。

You can do something like this:你可以这样做:

update files_to_process set processing_status='PROCESSING' where file_id = (
    SELECT file_id FROM files_to_process 
      WHERE processing_status = 'NEW' 
      ORDER BY file_id FOR UPDATE SKIP LOCKED LIMIT 1
) returning *;

With this formulation, there should be no race conditions.使用此公式,不应存在竞争条件。 You would run this in a transaction by itself (or under autocommit, just run the statement and it will automatically form its own transaction).您可以在事务中单独运行它(或在自动提交下,只需运行该语句,它就会自动形成自己的事务)。

But rather than using just 'PROCESSING', I would probably make it 'PROCESSING by machine worker7 PID 19345' or something like that.但与其只使用“处理”,我可能会将其设置为“由机器 worker7 PID 19345 处理”或类似的东西。 Otherwise, how will you known when processing failed if fails in an unclean way?否则,如果以不干净的方式失败,您如何知道处理何时失败? (That is the nice thing about doing it in one transaction, failures should rollback themselves). (这是在一个事务中完成它的好处,失败应该自行回滚)。

Unfortunately I can't make all operation inside transaction since processing files takes a lot of time and transaction pool will be exhausted不幸的是我不能在交易中进行所有操作,因为处理文件需要很多时间并且交易池将被耗尽

But you should never have more outstanding transactions than you have CPUs available to do work.但是,您的未完成事务永远不应超过可用于工作的 CPU。 Unless you have a very large compute farm, you should be able to make the pool large enough.除非你有一个非常大的计算场,否则你应该能够使池足够大。 But the big problem with this approach is that you have no visibility into what is happening.但这种方法的大问题是您无法了解正在发生的事情。

For the two transaction approach, for performance you will probably want to make a partial index:对于两种事务方法,为了提高性能,您可能需要制作部分索引:

create index on files_to_process (file_id ) where processing_status = 'NEW';

Otherwise you will have to dig through all of the completed ones with low file_id to find the next NEW one, and eventually that will get slow.否则,您将不得不挖掘所有具有低 file_id 的已完成文件以找到下一个新文件,最终会变慢。 You might also need to VACUUM the table more aggressively than the default.您可能还需要比默认情况更积极地对表进行 VACUUM。

Try a mutex, simplistic example:尝试一个互斥锁,简单的例子:

try {
  mutex.acquire();
  try {
    // access and update record to processing
  } finally {
    mutex.release();
  }
} catch(InterruptedException ie) {
  // ...
}

Depending on your code you may lock it various ways, see: Is there a Mutex in Java?根据您的代码,您可以通过多种方式锁定它,请参阅: Is there a Mutex in Java?

EDIT:编辑:

Sorry thought this was a c++ quesiton, this is the java version抱歉,这是一个 c++ 问题,这是 java 版本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM