Ruby中的线程安全枚举器

Question

TLDR: Is there a thread-safe version of the Enumerator class in Ruby? TLDR：Ruby中是否存在线程安全版本的Enumerator类？

What I'm trying to do: 我正在尝试做什么：

I have a method in a Ruby On Rails application that I wanted to run concurrently. 我在Ruby On Rails应用程序中有一个方法，我想同时运行。 The method is supposed to create a zip file containing reports from the site, where each file in the zip is a PDF. 该方法应该创建一个包含来自站点的报告的zip文件，其中zip中的每个文件都是PDF。 The conversion from html to PDF is somewhat slow, thus the desire to multi-thread. 从html到PDF的转换有点慢，因此需要多线程。

How I expected to do it: 我的期望如何：

I wanted to use 5 threads, so I figured I would have a shared Enumerator between the threads. 我想使用5个线程，所以我想我会在线程之间有一个共享的枚举器。 Each thread would pop a value from the Enumerator, and run do stuff to it. 每个线程都会从Enumerator中弹出一个值，然后运行do stuff。 Here's how I was thinking it would work: 以下是我认为它会起作用的方式：

t = Zip::OutputStream::write_buffer do |z|
  mutex = Mutex.new
  gen = Enumerator.new{ |g|
    Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).find_each do |report|
      g.yield report
    end
  }
  5.times.map {
    Thread.new do
      begin
        loop do
          mutex.synchronize  do
            @report = gen.next
          end
          title = @report.title + "_" + @report.id.to_s
          title += ".pdf" unless title.end_with?(".pdf")
          pdf = PDFKit.new(render_to_string(:template => partial_url, locals: {array: [@report]},
                                            :layout => false)).to_pdf
          mutex.synchronize  do
            z.put_next_entry(title)
            z.write(pdf)
          end
        end
      rescue StopIteration
        # do nothing
      end
    end
  }.each {|thread| thread.join }
end

What happened when I tried it: 我尝试时发生了什么：

When I ran the above code, I got the following error: 当我运行上面的代码时，我收到以下错误：

FiberError at /generate_report
fiber called across threads

After some searching, I came across this post , which recommended that I use a Queue instead of an Enumerator, because Queues are thread safe, while Enumerators are not. 经过一些搜索，我发现这篇文章，建议我使用队列而不是枚举器，因为队列是线程安全的，而枚举器不是。 While this might be reasonable for non-Rails applications, this is impractical for me. 虽然这对非Rails应用程序可能是合理的，但这对我来说是不切实际的。

Why I can't just use a Queue: 为什么我不能只使用队列：

The nice thing about Rails 4 ActiveRecord is that it doesn't load queries until they are iterated over. 关于Rails 4 ActiveRecord的好处是，它们在迭代之前不会加载查询。 And, if you use a method like find_each to iterate over it, it does it in batches of 1000, so you never have to store an entire table in ram all at once. 而且，如果你使用像find_each这样的方法迭代它，它会以1000的批量进行，所以你不必一次将所有表存储在ram中。 The results from query I'm using: Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}) is large. 我正在使用的查询结果： Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]})很大。 Very large. 很大。 And I need to be able to load it on the fly, rather than doing something like: 而且我需要能够动态加载它，而不是像以下那样：

gen = Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).map(&queue.method(:push))

Which will load that entire query into ram. 这会将整个查询加载到ram中。

Finally the question: 最后问题是：

Is there a thread-safe way of doing this: 是否有一种线程安全的方法：

gen = Enumerator.new{ |g|
        Report.all.includes(...).find_each do |report|
          g.yield report
        end
}

So that I can pop data from gen across multiple threads, without having to load my entire Report (and all of the includes) table into ram? 这样我就可以跨多个线程从gen弹出数据，而不必将我的整个Report （和所有包含的）表加载到ram中？

Answer 1

If you start the worker threads before filling up the queue, they will start consuming the queue as you fill it up, and because as a rule of thumb - network is slower than CPU, each batch should be (mostly) consumed by the time the next batch arrives: 如果在填充队列之前启动工作线程，它们将在填充时开始使用队列，并且因为根据经验 - 网络比CPU慢，所以每个批次应该（大部分）在时间消耗下一批到货：

queue = Queue.new

t1 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end

(0..1000).map(&queue.method(:push))

t1.join
t2.join

If that proves too slow still, you can opt to use SizedQueue , which will block the push if the queue reaches a big enough size: 如果证明它太慢，你可以选择使用SizedQueue ，如果队列达到足够大的大小，它将阻止push ：

queue = SizedQueue.new(100)

t1 = Thread.new do
  while !queue.empty?
    p "#{queue.pop(true)} - #{queue.size}"
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
(0..300).map(&queue.method(:push))
t1.join
t2.join

Ruby中的线程安全枚举器

问题描述

TLDR: Is there a thread-safe version of the Enumerator class in Ruby? TLDR：Ruby中是否存在线程安全版本的Enumerator类？

What I'm trying to do: 我正在尝试做什么：

How I expected to do it: 我的期望如何：

What happened when I tried it: 我尝试时发生了什么：

Why I can't just use a Queue: 为什么我不能只使用队列：

Finally the question: 最后问题是：

1 个解决方案

解决方案1
1 2015-09-11 05:52:37

Ruby中的线程安全枚举器

问题描述

TLDR: Is there a thread-safe version of the Enumerator class in Ruby? TLDR：Ruby中是否存在线程安全版本的Enumerator类？

What I'm trying to do: 我正在尝试做什么：

How I expected to do it: 我的期望如何：

What happened when I tried it: 我尝试时发生了什么：

Why I can't just use a Queue: 为什么我不能只使用队列：

Finally the question: 最后问题是：

1 个解决方案

解决方案1 1 2015-09-11 05:52:37

解决方案1
1 2015-09-11 05:52:37