简体   繁体   English

Ruby中的线程安全枚举器

[英]Thread safe Enumerator in Ruby

TLDR: Is there a thread-safe version of the Enumerator class in Ruby? TLDR:Ruby中是否存在线程安全版本的Enumerator类?


What I'm trying to do: 我正在尝试做什么:

I have a method in a Ruby On Rails application that I wanted to run concurrently. 我在Ruby On Rails应用程序中有一个方法,我想同时运行。 The method is supposed to create a zip file containing reports from the site, where each file in the zip is a PDF. 该方法应该创建一个包含来自站点的报告的zip文件,其中zip中的每个文件都是PDF。 The conversion from html to PDF is somewhat slow, thus the desire to multi-thread. 从html到PDF的转换有点慢,因此需要多线程。

How I expected to do it: 我的期望如何:

I wanted to use 5 threads, so I figured I would have a shared Enumerator between the threads. 我想使用5个线程,所以我想我会在线程之间有一个共享的枚举器。 Each thread would pop a value from the Enumerator, and run do stuff to it. 每个线程都会从Enumerator中弹出一个值,然后运行do stuff。 Here's how I was thinking it would work: 以下是我认为它会起作用的方式:

t = Zip::OutputStream::write_buffer do |z|
  mutex = Mutex.new
  gen = Enumerator.new{ |g|
    Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).find_each do |report|
      g.yield report
    end
  }
  5.times.map {
    Thread.new do
      begin
        loop do
          mutex.synchronize  do
            @report = gen.next
          end
          title = @report.title + "_" + @report.id.to_s
          title += ".pdf" unless title.end_with?(".pdf")
          pdf = PDFKit.new(render_to_string(:template => partial_url, locals: {array: [@report]},
                                            :layout => false)).to_pdf
          mutex.synchronize  do
            z.put_next_entry(title)
            z.write(pdf)
          end
        end
      rescue StopIteration
        # do nothing
      end
    end
  }.each {|thread| thread.join }
end

What happened when I tried it: 我尝试时发生了什么:

When I ran the above code, I got the following error: 当我运行上面的代码时,我收到以下错误:

FiberError at /generate_report
fiber called across threads

After some searching, I came across this post , which recommended that I use a Queue instead of an Enumerator, because Queues are thread safe, while Enumerators are not. 经过一些搜索,我发现这篇文章 ,建议我使用队列而不是枚举器,因为队列是线程安全的,而枚举器不是。 While this might be reasonable for non-Rails applications, this is impractical for me. 虽然这对非Rails应用程序可能是合理的,但这对我来说是不切实际的。

Why I can't just use a Queue: 为什么我不能只使用队列:

The nice thing about Rails 4 ActiveRecord is that it doesn't load queries until they are iterated over. 关于Rails 4 ActiveRecord的好处是,它们在迭代之前不会加载查询。 And, if you use a method like find_each to iterate over it, it does it in batches of 1000, so you never have to store an entire table in ram all at once. 而且,如果你使用像find_each这样的方法迭代它,它会以1000的批量进行,所以你不必一次将所有表存储在ram中。 The results from query I'm using: Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}) is large. 我正在使用的查询结果: Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]})很大。 Very large. 很大。 And I need to be able to load it on the fly, rather than doing something like: 而且我需要能够动态加载它,而不是像以下那样:

gen = Report.all.includes("employee" => ["boss", "client"], "projects" => {"project_owner" => ["project_team"]}).map(&queue.method(:push))

Which will load that entire query into ram. 这会将整个查询加载到ram中。

Finally the question: 最后问题是:

Is there a thread-safe way of doing this: 是否有一种线程安全的方法:

gen = Enumerator.new{ |g|
        Report.all.includes(...).find_each do |report|
          g.yield report
        end
}

So that I can pop data from gen across multiple threads, without having to load my entire Report (and all of the includes) table into ram? 这样我就可以跨多个线程从gen弹出数据,而不必将我的整个Report (和所有包含的)表加载到ram中?

If you start the worker threads before filling up the queue, they will start consuming the queue as you fill it up, and because as a rule of thumb - network is slower than CPU, each batch should be (mostly) consumed by the time the next batch arrives: 如果在填充队列之前启动工作线程,它们将在填充时开始使用队列,并且因为根据经验 - 网络比CPU慢,所以每个批次应该(大部分)在时间消耗下一批到货:

queue = Queue.new

t1 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end

(0..1000).map(&queue.method(:push))

t1.join
t2.join

If that proves too slow still, you can opt to use SizedQueue , which will block the push if the queue reaches a big enough size: 如果证明它太慢,你可以选择使用SizedQueue ,如果队列达到足够大的大小,它将阻止push

queue = SizedQueue.new(100)

t1 = Thread.new do
  while !queue.empty?
    p "#{queue.pop(true)} - #{queue.size}"
    sleep(0.1)
  end
end
t2 = Thread.new do
  while !queue.empty?
    p queue.pop(true)
    sleep(0.1)
  end
end
(0..300).map(&queue.method(:push))
t1.join
t2.join

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM