简体   繁体   中英

How to manage a long running process in a Catalyst App?

This is my first Catalyst app and I'm not sure how to solve the following problem.

The user enters some data in a form and selects a file (up to 100MB) for uploading. After submitting the form, the actual computation takes up to 5 minutes and the results are stored in a DB.

What I want to do is to run this process (and maybe also the file upload) in the background to avoid a server timeout. There should be some kind of feedback to the user (like a message "Job has been started" or a progress bar). The form should be blocked while the job is still running. A result page should be displayed once the job finished.

In hours of reading I stumbled upon concepts like asynchronous requests, job queues, daemons, Gearman , or Catalyst::Plugin::RunAfterRequest .

How would you do it? Thanks for helping a web dev novice!

PS: In my current local app the work is done in parallel with Parallel::ForkManager . For the real app, would it be advisable to use a cloud computing service like Amazon EC2? Or just find a hoster who offers multi-core servers?

Put the job in a queue and do it in a different process, outside of the Web application. While you Catalyst process is busy, even if using Catalyst::Plugin::RunAfterRequest, it cannot be used to process other web requests.

There are very simple queuing systems, like File::Queue . Basically, you assign a job ID to the document, put it in the queue. Another process checks the queue and picks up new jobs.

You can save the job status in a database, or anything accessible any the web applications. On the front end, you can poll the job status every X seconds or minutes to give feedback to the user.

You have to figure out how much memory and CPU you need. Multi-core CPU or multiple CPUs may not be required, even if you have several processes running. Choosing between a dedicated server or cloud like EC2 is more about the flexibility (resizing, snapshot, etc.) vs. price.

Somehow I couldn't get the idea of File::Queue. For non-blocking parallel execution, I ended up using a combination of TheSchwartz and Parallel::Prefork like it is implemented in the Foorum Catalyst App . Basically, there are 5 important elements. Maybe this summary will be helpful to others.

1) TheSchwartz DB

2) A client (DB handle) for the TheSchwartz DB

package MyApp::TheSchwartz::Client;

use TheSchwartz;    
sub theschwartz {
    my $theschwartz = TheSchwartz->new(
        databases => [ {
            dsn  => 'dbi:mysql:theschwartz',
            user => 'user',
            pass => 'pass',
        } ],
        verbose => 1,
    );
    return $theschwartz;
}

3) A job worker (where the actual work is done)

package MyApp::TheSchwartz::Worker::Test;

use base qw( TheSchwartz::Moosified::Worker );  
use MyApp::Model::DB;      # Catalyst DB connect_info
use MyApp::Schema;         # Catalyst DB schema   

sub work {
    my $class = shift;
    my $job = shift;    
    my ($args) = $job->arg;
    my ($arg1, $arg2) = @$args;

    # re-use Catalyst DB schema    
    my $connect_info = MyApp::Model::DB->config->{connect_info};
    my $schema = MyApp::Schema->connect($connect_info);

    # do the heavy lifting

    $job->completed();
}

4) A worker process TheSchwartzWorker.pl that monitors the table job non-stop

use MyApp::TheSchwartz::Client qw/theschwartz/;    # db connection
use MyApp::TheSchwartz::Worker::Test;
use Parallel::Prefork;

my $client = theschwartz();

my $pm = Parallel::Prefork->new({
    max_workers  => 16,
    trap_signals => {
        TERM => 'TERM',
        HUP  => 'TERM',
        USR1 => undef,
    }
});

while ($pm->signal_received ne 'TERM') {
    $pm->start and next;

    $client->can_do('MyApp::TheSchwartz::Worker::Test');    
    my $delay = 10;    # When no job is available, the working process will sleep for $delay seconds
    $client->work( $delay );

    $pm->finish;
}    
$pm->wait_all_children();

5) In the Catalyst controller: insert a new job into the table job and pass some arguments

use MyApp::TheSchwartz::Client qw/theschwartz/;
sub start : Chained('base') PathPart('start') Args(0) {
    my ($self, $c ) = @_;

    $client = theschwartz();
    $client->insert(‘MyApp::TheSchwartz::Worker::Test’, [ $arg1, $arg2 ]);

    $c->response->redirect(
        $c->uri_for(
            $self->action_for('archive'),
            {mid => $c->set_status_msg("Run '$name' started")}
        )
    );
}

The new run is greyed out on the "archive" page until all results are available in the database.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM