Running command-line processes in parallel in Ruby

Question

I'm using PhantomJS, a command-line tool, to render images of websites, and I want to run a number of these in parallel instead of doing one after the other. How can I do this?

Answer 1

Here's an Example using Resque . Note I've left escaping out for brevity... you should never pass external inputs directly into shell commands.

class RasterizeWebPageJob
  @queue = :screenshots
  def self.perform(url)
    system("/usr/bin/env DISPLAY=:1 phantomjs rasterize.js #{url} ...")
  end
end

10.times { Resque.enqueue(RasterizeWebPageJob, "http://google.com/") }

Provided you're running enough workers (and there are workers available), they'll execute in parallel. The important thing here is that you put separate jobs onto the queue instead of processing multiple screenshots from within the one job.

I'd advise against using Thread.new in a Rails controller. Queues are much easier (and safer) to manage than Threads.

Answer 2

There are multiple ways of doing it. What you are looking for is to do asynchronous jobs in the background. This video may help: http://railscasts.com/episodes/128-starling-and-workling

Answer 3

I think what these other answers may be missing is providing a basic education on a design pattern that you'll want to use. Yes, Resque or Starling and Workling or Resque combined with Foreman will be great solutions, but you'll probably want to know why.

I believe the pattern you'll want to use is the Observer Pattern or Publisher-Subscriber or PubSub, for short. The idea is similar to how a printer might work, in the simplest case.

A person (publisher) clicks print in say, a web browser. Then, asynchronously, the printer prints them. The printer, if it's not on, will pick up the messages when it turns on. If multiple people send documents to the printer, the printer will select them in order (FIFO) and then process (print) them. If there are multiple printers listening to the same queue (this is where the metaphor breaks down since you usually don't have that), then they can select messages in turn to process the queue faster.

Resque and other PubSub gems, projects, JARs (you're not limited to Ruby) implement this design pattern.

More info about the pattern here (note that the Java Observable is a class which is a bad design decision. You can implement your own):

http://ruby-doc.org/stdlib-2.0/libdoc/observer/rdoc/Observable.html http://docs.oracle.com/javase/7/docs/api/java/util/Observable.html http://en.wikipedia.org/wiki/Observer_pattern http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

For our processing, we use Resque for smaller tasks, but you're still limited to global interpreter lock and other issues like having to deploy your code to a server, install gems, etc. We now use Storm ( https://github.com/nathanmarz/storm ) to handle our stream processing and it works way better. Storm may be overkill for what you're trying to do, depending on how many images you're processing in a day.

Running command-line processes in parallel in Ruby

Question

3 answers

solution1
4 ACCPTED 2011-06-25 13:23:23

solution2
1 2011-06-25 05:25:03

solution3
0 2013-06-11 12:59:31

Running command-line processes in parallel in Ruby

Question

3 answers

solution1 4 ACCPTED 2011-06-25 13:23:23

solution2 1 2011-06-25 05:25:03

solution3 0 2013-06-11 12:59:31

solution1
4 ACCPTED 2011-06-25 13:23:23

solution2
1 2011-06-25 05:25:03

solution3
0 2013-06-11 12:59:31