简体   繁体   English

是什么导致我的Ruby`track`块中出现这种死锁?

[英]What causes this deadlock in my Ruby `trap` block?

I'm reading through Jesse Storimer's excellent book, Working with Unix Processes . 我正在阅读Jesse Storimer的优秀着作“使用Unix进程” In a section on trapping signals from child processes which have exited, he gives a code sample. 在一个关于从已经退出的子进程捕获信号的部分中,他给出了一个代码示例。

I have modified that code slightly (see below) to get a bit more visibility into what's happening: 我稍微修改了这段代码(见下文),以便更清楚地了解正在发生的事情:

  • That the parent resumes its own execution between signals (which I can see with its puts ), 父恢复信号之间自身的执行(我可以用它看puts
  • That the wait executes for multiple children in one trap statement (sometimes I get "Received a CHLD signal" once followed by multiple "child pid exited"). wait在一个trap语句中为多个孩子执行wait (有时我得到“收到CHLD信号”一次后跟多个“子pid退出”)。

Expected Output 预期产出

Normally the output from the code below looks something like: 通常,下面代码的输出类似于:

parent is working hard
Received a CHLD signal
child pid 73408 exited
parent is working hard
parent is working hard
parent is working hard
Received a CHLD signal
child pid 73410 exited
child pid 73409 exited
All children exited - parent exiting too.

The occasional error 偶尔的错误

But once in a while I get an error like this: 但有一段时间我得到这样的错误:

trapping_signals.rb:17:in `write': deadlock; recursive locking (ThreadError)
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `block in <main>'
    from trapping_signals.rb:17:in `call'
    from trapping_signals.rb:17:in `write'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `block in <main>'
    from trapping_signals.rb:40:in `call'
    from trapping_signals.rb:40:in `sleep'
    from trapping_signals.rb:40:in `block in <main>'
    from trapping_signals.rb:38:in `loop'
    from trapping_signals.rb:38:in `<main>

Can anyone explain to me what's going wrong here? 任何人都可以向我解释这里出了什么问题吗?

The code 代码

child_processes = 3
dead_processes = 0

# We fork 3 child processes.
child_processes.times do
  fork do
    # Each sleeps between 0 and 5 seconds
    sleep rand(5)
  end
end

# Our parent process will be busy doing some work.
# But still wants to know when one of its children exits.

# By trapping the :CHLD signal our process will be notified by the kernel
# when one of its children exits.
trap(:CHLD) do
  puts "Received a CHLD signal"
  # Since Process.wait queues up any data that it has for us we can ask for it
  # here, since we know that one of our child processes has exited.

  # We loop over a non-blocking Process.wait to ensure that any dead child
  # processes are accounted for.
  # Here we wait without blocking.
  while pid = Process.wait(-1, Process::WNOHANG)
    puts "child pid #{pid} exited"
    dead_processes += 1

    # We exit ourselves once all the child processes are accounted for.
    if dead_processes == child_processes
      puts "All children exited - parent exiting too."
      exit
    end
  end
end

# Work it.
loop do
  puts "parent is working hard"
  sleep 1
end

I looked through Ruby sources to see where that particular error is raised, and it's only ever raised when the current thread tries to acquire a lock, but that same lock is already taken by the current thread. 我查看了Ruby源代码以查看引发该特定错误的位置,并且只在当前线程尝试获取锁定时才会引发该错误,但当前线程已经采用了相同的锁定。 This implies that locking is not re-entrant: 这意味着锁定不是可重入的:

m = Mutex.new
m.lock
m.lock #=> same error as yours

Now at least we know what happens, but not yet why and where. 现在至少我们知道会发生什么,但不知道为什么以及在哪里。 The error message indicates that it happens during the call to puts . 错误消息表明它在调用puts期间发生。 When it gets called, it finally ends up in io_binwrite . 当它被调用时,它最终以io_binwrite结束。 stdout is not synchronized, but it is buffered, so this if condition is fulfilled on the first call, and a buffer plus a write lock for that buffer will be set up. stdout未同步,但它是缓冲的,因此在第一次调用时满足此条件 ,并且将设置缓冲区加上该缓冲区的写锁定。 The write lock is important to guarantee the atomicity of writes to stdout, it shouldn't happen that two threads simultaneously writing to stdout mix up each other's output. 写锁定对于保证写入stdout的原子性很重要,不应该发生同时写入stdout两个线程混淆了彼此的输出。 To demonstrate what I mean: 为了证明我的意思:

t1 = Thread.new { 100.times { print "aaaaa" } }
t2 = Thread.new { 100.times { print "bbbbb" } }
t1.join
t2.join

Although both threads take turns in writing to stdout , it will never happen that a single write is broken up - you will always have the full 5 a's or b's in sequence. 虽然两个线程轮流写入stdout ,但是单个写入被打破绝不会发生 - 你将始终按顺序排列完整的5个或b个。 That's what the write lock is there for . 这就是写锁的用途

Now what goes wrong in your case is a race condition on that write lock. 现在出现问题的是写锁定的竞争条件。 The parent process loops and writes to stdout every second("parent is working hard"). 父进程每秒循环并写入stdout (“父进程正在努力”)。 But the same thread also eventually executes the trap block and tries again to write to stdout ("Received a CHLD signal"). 但是同一个线程最终也会执行trap块并再次尝试写入stdout (“收到CHLD信号”)。 You can verify that it's really the same thread by adding #{Thread.current} in your puts statements. 您可以通过在puts语句中添加#{Thread.current}来验证它是否真的是相同的线程。 If those two events happen closely enough, then you will have the same situation as in the first example: the same thread trying to obtain the same lock twice, and this ultimately triggers the error. 如果这两个事件发生得足够紧密,那么你将遇到与第一个例子相同的情况:同一个线程试图获得两次相同的锁,这最终会触发错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM