简体   繁体   English

在 Ruby 中并行运行命令行进程

[英]Running command-line processes in parallel in Ruby

I'm using PhantomJS, a command-line tool, to render images of websites, and I want to run a number of these in parallel instead of doing one after the other.我正在使用 PhantomJS(一种命令行工具)来渲染网站的图像,并且我想并行运行多个这些图像,而不是一个接一个地执行。 How can I do this?我怎样才能做到这一点?

Here's an Example using Resque .这是一个使用Resque的示例。 Note I've left escaping out for brevity... you should never pass external inputs directly into shell commands.请注意,为了简洁起见,我没有进行转义……您永远不应该将外部输入直接传递给 shell 命令。

class RasterizeWebPageJob
  @queue = :screenshots
  def self.perform(url)
    system("/usr/bin/env DISPLAY=:1 phantomjs rasterize.js #{url} ...")
  end
end

10.times { Resque.enqueue(RasterizeWebPageJob, "http://google.com/") }

Provided you're running enough workers (and there are workers available), they'll execute in parallel.如果您运行了足够多的 worker(并且有可用的 worker),它们将并行执行。 The important thing here is that you put separate jobs onto the queue instead of processing multiple screenshots from within the one job.这里重要的是将单独的作业放入队列,而不是从一个作业中处理多个屏幕截图。

I'd advise against using Thread.new in a Rails controller.我建议不要在 Rails 控制器中使用Thread.new Queues are much easier (and safer) to manage than Threads.队列比线程更容易(也更安全)管理。

There are multiple ways of doing it.有多种方法可以做到。 What you are looking for is to do asynchronous jobs in the background.您正在寻找的是在后台执行异步作业。 This video may help: http://railscasts.com/episodes/128-starling-and-workling该视频可能会有所帮助: http : //railscasts.com/episodes/128-starling-and-workling

I think what these other answers may be missing is providing a basic education on a design pattern that you'll want to use.我认为这些其他答案可能缺少的是提供有关您将要使用的设计模式的基础教育。 Yes, Resque or Starling and Workling or Resque combined with Foreman will be great solutions, but you'll probably want to know why.是的,Resque 或 Starling 和 Workling 或 Resque 与 Foreman 结合将是很好的解决方案,但您可能想知道为什么。

I believe the pattern you'll want to use is the Observer Pattern or Publisher-Subscriber or PubSub, for short.我相信您要使用的模式是观察者模式或发布者-订阅者或 PubSub,简称。 The idea is similar to how a printer might work, in the simplest case.在最简单的情况下,这个想法类似于打印机的工作方式。

A person (publisher) clicks print in say, a web browser.一个人(出版商)在网络浏览器中点击打印。 Then, asynchronously, the printer prints them.然后,打印机异步打印它们。 The printer, if it's not on, will pick up the messages when it turns on.如果打印机没有打开,它会在打开时接收消息。 If multiple people send documents to the printer, the printer will select them in order (FIFO) and then process (print) them.如果多人向打印机发送文档,打印机会按顺序选择(FIFO)然后进行处理(打印)。 If there are multiple printers listening to the same queue (this is where the metaphor breaks down since you usually don't have that), then they can select messages in turn to process the queue faster.如果有多台打印机在侦听同一个队列(这是隐喻分解的地方,因为您通常没有它),那么它们可以依次选择消息以更快地处理队列。

Resque and other PubSub gems, projects, JARs (you're not limited to Ruby) implement this design pattern. Resque 和其他 PubSub gem、项目、JAR(不限于 Ruby)实现了这种设计模式。

More info about the pattern here (note that the Java Observable is a class which is a bad design decision. You can implement your own):关于这里模式的更多信息(请注意,Java Observable 是一个糟糕的设计决策的类。您可以实现自己的):

http://ruby-doc.org/stdlib-2.0/libdoc/observer/rdoc/Observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern http://ruby-doc.org/stdlib-2.0/libdoc/observer/rdoc/Observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http: //en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

For our processing, we use Resque for smaller tasks, but you're still limited to global interpreter lock and other issues like having to deploy your code to a server, install gems, etc. We now use Storm ( https://github.com/nathanmarz/storm ) to handle our stream processing and it works way better.对于我们的处理,我们将 Resque 用于较小的任务,但您仍然仅限于全局解释器锁定和其他问题,例如必须将代码部署到服务器、安装 gems 等。我们现在使用 Storm( https://github.com)。 com/nathanmarz/storm ) 来处理我们的流处理,并且效果更好。 Storm may be overkill for what you're trying to do, depending on how many images you're processing in a day. Storm 可能对您尝试做的事情有些过分,这取决于您一天处理的图像数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM