[英]How to manage a long running process in a Catalyst App?
This is my first Catalyst app and I'm not sure how to solve the following problem. 这是我的第一个Catalyst应用程序,我不知道如何解决以下问题。
The user enters some data in a form and selects a file (up to 100MB) for uploading. 用户在表单中输入一些数据并选择一个文件(最多100MB)进行上传。 After submitting the form, the actual computation takes up to 5 minutes and the results are stored in a DB. 提交表单后,实际计算最多需要5分钟,结果存储在数据库中。
What I want to do is to run this process (and maybe also the file upload) in the background to avoid a server timeout. 我想要做的是在后台运行此过程(也可能是文件上载)以避免服务器超时。 There should be some kind of feedback to the user (like a message "Job has been started" or a progress bar). 应该向用户提供某种反馈(例如消息“已启动作业”或进度条)。 The form should be blocked while the job is still running. 作业仍在运行时应该阻止表单。 A result page should be displayed once the job finished. 作业完成后,应显示结果页面。
In hours of reading I stumbled upon concepts like asynchronous requests, job queues, daemons, Gearman , or Catalyst::Plugin::RunAfterRequest . 在几个小时的阅读中,我偶然发现了异步请求,作业队列,守护进程, Gearman或Catalyst :: Plugin :: RunAfterRequest等概念 。
How would you do it? 你会怎么做? Thanks for helping a web dev novice! 感谢您帮助网络开发新手!
PS: In my current local app the work is done in parallel with Parallel::ForkManager . PS:在我当前的本地应用程序中,工作与Parallel :: ForkManager并行完成 。 For the real app, would it be advisable to use a cloud computing service like Amazon EC2? 对于真正的应用程序,建议使用像Amazon EC2这样的云计算服务吗? Or just find a hoster who offers multi-core servers? 或者只是找一个提供多核服务器的主机?
Put the job in a queue and do it in a different process, outside of the Web application. 将作业放入队列中,并在Web应用程序之外的其他进程中执行此操作。 While you Catalyst process is busy, even if using Catalyst::Plugin::RunAfterRequest, it cannot be used to process other web requests. 当您的Catalyst进程繁忙时,即使使用Catalyst :: Plugin :: RunAfterRequest,它也不能用于处理其他Web请求。
There are very simple queuing systems, like File::Queue . 有非常简单的排队系统,比如File :: Queue 。 Basically, you assign a job ID to the document, put it in the queue. 基本上,您将作业ID分配给文档,并将其放入队列中。 Another process checks the queue and picks up new jobs. 另一个进程检查队列并获取新作业。
You can save the job status in a database, or anything accessible any the web applications. 您可以将作业状态保存在数据库中,也可以将任何Web应用程序中的任何内容保存。 On the front end, you can poll the job status every X seconds or minutes to give feedback to the user. 在前端,您可以每X秒或几分钟轮询一次作业状态,以便向用户提供反馈。
You have to figure out how much memory and CPU you need. 你必须弄清楚你需要多少内存和CPU。 Multi-core CPU or multiple CPUs may not be required, even if you have several processes running. 即使您有多个进程在运行,也可能不需要多核CPU或多个CPU。 Choosing between a dedicated server or cloud like EC2 is more about the flexibility (resizing, snapshot, etc.) vs. price. 在EC2之类的专用服务器或云之间进行选择更多的是关于灵活性(调整大小,快照等)与价格的关系。
Somehow I couldn't get the idea of File::Queue. 不知怎的,我无法理解File :: Queue。 For non-blocking parallel execution, I ended up using a combination of TheSchwartz and Parallel::Prefork like it is implemented in the Foorum Catalyst App . 对于非阻塞并行执行,我最终使用了TheSchwartz和Parallel :: Prefork的组合,就像它在Foorum Catalyst应用程序中实现一样。 Basically, there are 5 important elements. 基本上,有5个重要元素。 Maybe this summary will be helpful to others. 也许这个总结对其他人有帮助。
1) TheSchwartz DB 1) The Schwartz DB
2) A client (DB handle) for the TheSchwartz DB 2)TheSchwartz DB的客户端(DB句柄)
package MyApp::TheSchwartz::Client;
use TheSchwartz;
sub theschwartz {
my $theschwartz = TheSchwartz->new(
databases => [ {
dsn => 'dbi:mysql:theschwartz',
user => 'user',
pass => 'pass',
} ],
verbose => 1,
);
return $theschwartz;
}
3) A job worker (where the actual work is done) 3)工作者(实际工作完成的地方)
package MyApp::TheSchwartz::Worker::Test;
use base qw( TheSchwartz::Moosified::Worker );
use MyApp::Model::DB; # Catalyst DB connect_info
use MyApp::Schema; # Catalyst DB schema
sub work {
my $class = shift;
my $job = shift;
my ($args) = $job->arg;
my ($arg1, $arg2) = @$args;
# re-use Catalyst DB schema
my $connect_info = MyApp::Model::DB->config->{connect_info};
my $schema = MyApp::Schema->connect($connect_info);
# do the heavy lifting
$job->completed();
}
4) A worker process TheSchwartzWorker.pl
that monitors the table job non-stop 4)工作进程TheSchwartzWorker.pl
地监视表作业
use MyApp::TheSchwartz::Client qw/theschwartz/; # db connection
use MyApp::TheSchwartz::Worker::Test;
use Parallel::Prefork;
my $client = theschwartz();
my $pm = Parallel::Prefork->new({
max_workers => 16,
trap_signals => {
TERM => 'TERM',
HUP => 'TERM',
USR1 => undef,
}
});
while ($pm->signal_received ne 'TERM') {
$pm->start and next;
$client->can_do('MyApp::TheSchwartz::Worker::Test');
my $delay = 10; # When no job is available, the working process will sleep for $delay seconds
$client->work( $delay );
$pm->finish;
}
$pm->wait_all_children();
5) In the Catalyst controller: insert a new job into the table job and pass some arguments 5)在Catalyst控制器中:将新作业插入表作业并传递一些参数
use MyApp::TheSchwartz::Client qw/theschwartz/;
sub start : Chained('base') PathPart('start') Args(0) {
my ($self, $c ) = @_;
$client = theschwartz();
$client->insert(‘MyApp::TheSchwartz::Worker::Test’, [ $arg1, $arg2 ]);
$c->response->redirect(
$c->uri_for(
$self->action_for('archive'),
{mid => $c->set_status_msg("Run '$name' started")}
)
);
}
The new run is greyed out on the "archive" page until all results are available in the database. 新的运行在“存档”页面上显示为灰色,直到数据库中的所有结果都可用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.