简体   繁体   English

如何在Ruby中的join()中修复死锁

[英]How to fix a deadlock in join() in Ruby

I am working in multi-threading in Ruby. 我在Ruby中进行多线程处理。 The code snippet is: 代码段是:

  threads_array = Array.new(num_of_threads)  
  1.upto(num_of_threads) do |i|  

    Thread.abort_on_exception = true
      threads_array[i-1] =  Thread.new {
        catch(:exit) do
          print "s #{i}"
          user_id = nil
          loop do
            user_id = user_ids.pop()
            if user_id == nil
              print "a #{i}"
              Thread.stop()
            end
            dosomething(user_id)
          end
        end
      }
    end
    #puts "after thread"
    threads_array.each {|thread| thread.join}

I am not using any mutex locks, but I get a deadlock. 我没有使用任何互斥锁,但我遇到了僵局。 This is the output of the above code snippet: 这是上面代码片段的输出:

s 2s 6s 8s 1s 11s 7s 10s 14s 16s 21s 24s 5s 26s 3s 19s 20s 23s 4s 28s 9s 12s 18s 22s 29s 30s 27s 13s 17s 15s 25a 4a 10a 3a 6a 21a 24a 16a 9a 18a 5a 28a 20a 2a 22a 11a 29a 8a 14a 23a 26a 1a 19a 7a 12fatal: deadlock detected

The above output tells me that the deadlock is after the user_ids array is null and happening with Thread's join and stop . 上面的输出告诉我,死锁是在user_ids数组为null之后,并且发生了Thread的joinstop

What actually is happening and what is the solution to this error? 实际发生了什么,这个错误的解决方案是什么?

The simplest code to reproduce this issue is: 重现此问题的最简单代码是:

t = Thread.new { Thread.stop }
t.join # => exception in `join': deadlock detected (fatal)

Thread::stop → nil Thread :: stop →nil

Stops execution of the current thread, putting it into a “sleep” state, and schedules execution of another thread. 停止当前线程的执行,将其置于“休眠”状态,并调度另一个线程的执行。

Thread#join → thr 线程#join →thr
Thread#join(limit) → thr 线程#joed(限制)→thr

The calling thread will suspend execution and run thr. 调用线程将暂停执行并运行thr。 Does not return until thr exits or until limit seconds have passed. 直到thr退出或直到极限秒已经过去才返回。 If the time limit expires, nil will be returned, otherwise thr is returned. 如果时间限制到期,则返回nil,否则返回thr。

As far as I understand you call Thread.join without parameters on thread and wait for it to exit, but the child thread calls Thread.stop and goes into sleep status. 据我所知,你在线程上调用没有参数的Thread.join并等待它退出,但子线程调用Thread.stop并进入sleep状态。 This is a deadlock situation, the main thread waits for the child thread to exit, but the child thread is sleeping and not responding. 这是一个死锁情况,主线程等待子线程退出,但子线程正在休眠而没有响应。

If you call join with limit the parameter then the child thread will be aborted after a timeout without causing a deadlock to your program: 如果使用limit参数调用join ,那么子线程将在超时后中止而不会导致程序死锁:

t = Thread.new { Thread.stop }
t.join 1 # => Process finished with exit code 0

I would recommend exiting your worker threads after they do the job with Thread.exit or get rid of the infinite loop and reach the end of the execution thread normally, for example: 我建议在使用Thread.exit完成工作后退出工作线程,或者摆脱无限循环并正常到达执行线程的末尾,例如:

if user_id == nil
  raise StopIteration
end

#or 
if user_id == nil
  Thread.exit
end

In addition to Alex Kliuchnikau's answer, I'll add that #join could raise this error when thread is waiting for Queue#pop . 除了Alex Kliuchnikau的回答之外,我还要补充一点,当线程等待Queue#pop #join时, Queue#pop #join会引发这个错误。 A simple and conscious solution is call #join with a timeout. 一个简单而有意识的解决方案是在超时时调用#join

This is from ruby 2.2.2: 这是来自ruby 2.2.2:

[27] pry(main)> q=Queue.new
=> #<Thread::Queue:0x00000003a39848>
[30] pry(main)> q << "asdggg"
=> #<Thread::Queue:0x00000003a39848>
[31] pry(main)> q << "as"
=> #<Thread::Queue:0x00000003a39848>
[32] pry(main)> t = Thread.new {
[32] pry(main)*   while s = q.pop
[32] pry(main)*     puts s
[32] pry(main)*   end  
[32] pry(main)* }  
asdggg
as
=> #<Thread:0x00000003817ce0@(pry):34 sleep>
[33] pry(main)> q << "asg"
asg
=> #<Thread::Queue:0x00000003a39848>
[34] pry(main)> q << "ashg"
ashg
=> #<Thread::Queue:0x00000003a39848>
[35] pry(main)> t.join
fatal: No live threads left. Deadlock?
from (pry):41:in `join'
[36] pry(main)> t.join(5)
=> nil

If I get your intentions right I would consider something simpler (and probably safer, users_ids.pop() from within thread looks scary to me): 如果我的意图是正确的,我会考虑更简单的东西(也许更安全,来自线程内的users_ids.pop()对我来说可怕):

user_ids = (0..19).to_a
number_of_threads = 3

user_ids \
  .each_slice(user_ids.length / number_of_threads + 1) \
  .map { |slice| 
      Thread.new(slice) { |s| 
        puts s.inspect 
      }
  }.map(&:join)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM