简体   繁体   English

加速PHP应用程序

[英]Speeding up a PHP App

I have a list of data that needs to be processed. 我有一个需要处理的数据列表。 The way it works right now is this: 它现在的工作方式是这样的:

  • A user clicks a process button. 用户单击进程按钮。
  • The PHP code takes the first item that needs to be processed, takes 15-25 secs to process it, moves on to the next item, and so on. PHP代码获取需要处理的第一个项目,需要15-25秒来处理它,移动到下一个项目,依此类推。

This takes way too long. 这需要太长时间。 What I'd like instead is that: 我想要的是:

  • The user clicks the process button. 用户单击过程按钮。
  • A PHP script takes the first item and starts to process it. PHP脚本获取第一个项目并开始处理它。
  • Simultaneously another instance of the script takes the next item and processes it. 同时,脚本的另一个实例接受下一个项目并对其进行处理。
  • And so on, so around 5-6 of the items are being process simultaneously and we get 6 items processed in 15-25 secs instead of just one. 等等,所以大约5-6个项目正在同时处理,我们在15-25秒内处理了6个项目而不是一个。

Is something like this possible? 这样的事情可能吗?

I was thinking that I use CRON to launch an instance of the script every second. 我以为我每秒都使用CRON来启动脚本实例。 All items that need to be processed will be flagged as such in the MySQL database, so whenever an instance is launched through CRON, it will simply take the next item flagged to be processed and remove the flag. 所有需要处理的项目都将在MySQL数据库中标记为这样,因此每当通过CRON启动实例时,它只会将标记的下一个项目处理并删除该标志。

Thoughts? 思考?

Edit: To clarify something, each 'item' is stored in a mysql database table as seperate rows. 编辑:为了澄清一些事情,每个'item'作为单独的行存储在mysql数据库表中。 Whenever processing starts on an item, it is flagged as being processed in the db, hence each new instance will simply grab the next row which is not being processed and process it. 每当处理项开始处理时,它都被标记为在db中处理,因此每个新实例将只抓取未处理的下一行并处理它。 Hence I don't have to supply the items as command line arguments. 因此,我不必将这些项目作为命令行参数提供。

Here's one solution, not the greatest, but will work fine on Linux: 这是一个解决方案,而不是最好的解决方案,但在Linux上运行良好:

Split the processing PHP into a separate CLI scripts in which: 将处理PHP拆分为单独的CLI脚本,其中:

  • The command line inputs include `$id` and `$item` 命令行输入包括`$ id`和`$ item`
  • The script writes its PID to a file in `/tmp/$id.$item.pid` 该脚本将其PID写入`/ tmp / $ id。$ item.pid`中的文件
  • The script echos results as XML or something that can be read into PHP to stdout 脚本echos结果为XML或可以读入PHP到stdout的东西
  • When finished the script deletes the `/tmp/$id.$item.pid` file 完成后,脚本将删除`/ tmp / $ id。$ item.pid`文件

Your master script (presumably on your webserver) would do: 您的主脚本(可能在您的网络服务器上)会:

  • `exec("nohup php myprocessing.php $id $item > /tmp/$id.$item.xml");` for each item `exec(“nohup php myprocessing.php $ id $ item> /tmp/$id.$item.xml”);`对于每个项目
  • Poll the `/tmp/$id.$item.pid` files until all are deleted (sleep/check poll is enough) 轮询`/ tmp / $ id。$ item.pid`文件,直到删除所有文件(睡眠/检查轮询就够了)
  • If they are never deleted kill all the processing scripts and report failure 如果永远不会删除它们,请删除所有处理脚本并报告失败
  • If successful read the from `/tmp/$id.$item.xml` for format/output to user 如果成功,请从`/ tmp / $ id。$ item.xml`中读取格式/输出到用户
  • Delete the XML files if you don't want to cache for later use 如果您不想缓存以供以后使用,请删除XML文件

A backgrounded nohup started application will run independent of the script that started it. 一个后台的nohup启动应用程序将独立于启动它的脚本运行。

This interested me sufficiently that I decided to write a POC. 这让我很感兴趣,我决定写一个POC。

test.php test.php的

<?php
$dir =  realpath(dirname(__FILE__));
$start = time();

// Time in seconds after which we give up and kill everything
$timeout = 25;

// The unique identifier for the request
$id = uniqid();

// Our "items" which would be supplied by the user
$items = array("foo", "bar", "0xdeadbeef");

// We exec a nohup command that is backgrounded which returns immediately
foreach ($items as $item) {
    exec("nohup php proc.php $id $item > $dir/proc.$id.$item.out &");
}

echo "<pre>";
// Run until timeout or all processing has finished
while(time() - $start < $timeout) 
{
  echo (time() - $start), " seconds\n";
  clearstatcache();    // Required since PHP will cache for file_exists
  $running = array();
  foreach($items as $item)
  {
      // If the pid file still exists the process is still running    
      if (file_exists("$dir/proc.$id.$item.pid")) {
          $running[] = $item;
      }
  }
  if (empty($running)) break;
  echo implode($running, ','), " running\n";
  flush();
  sleep(1);  
}

// Clean up if we timeout out
if (!empty($running)) {
    clearstatcache();
    foreach ($items as $item) {
        // Kill process of anything still running (i.e. that has a pid file)
        if(file_exists("$dir/proc.$id.$item.pid") 
            && $pid = file_get_contents("$dir/proc.$id.$item.pid")) {
            posix_kill($pid, 9);                
            unlink("$dir/proc.$id.$item.pid");
            // Would want to log this in the real world
            echo "Failed to process: ", $item, " pid ", $pid, "\n";
    }
    // delete the useless data
    unlink("$dir/proc.$id.$item.out");
    }
} else {
    echo "Successfully processed all items in ", time() - $start, " seconds.\n";
    foreach ($items as $item) {
    // Grab the processed data and delete the file
        echo(file_get_contents("$dir/proc.$id.$item.out"));
        unlink("$dir/proc.$id.$item.out");
    }
}
echo "</pre>";
?>

proc.php proc.php

<?php
$dir =  realpath(dirname(__FILE__));
$id = $argv[1];
$item = $argv[2];

// Write out our pid file
file_put_contents("$dir/proc.$id.$item.pid", posix_getpid());

for($i=0;$i<80;++$i)
{
    echo $item,':', $i, "\n";
    usleep(250000);
}

// Remove our pid file to say we're done processing
unlink("proc.$id.$item.pid");

?>

Put test.php and proc.php in the same folder of your server, load test.php and enjoy. 将test.php和proc.php放在服务器的同一文件夹中,加载test.php并享受。

You will of course need nohup (unix) and PHP cli to get this to work. 你当然需要nohup(unix)和PHP cli来实现这一点。

Lots of fun, I may find a use for it later. 很多乐趣,我可能会在以后找到它的用途。

Use an external workqueue like Beanstalkd which your PHP script writes a bunch of jobs too. 使用像Beanstalkd这样的外部工作队列,你的PHP脚本也会编写一堆作业。 You have as many worker processes pulling jobs from beanstalkd and processing them as fast as possible. 您有尽可能多的工作进程从beanstalkd中提取作业并尽快处理它们。 You can spin up as many workers as you have memory / CPU. 你可以像拥有内存/ CPU那样增加工作量。 Your job body should contain as little information as possible, maybe just some IDs which you hit the DB with. 您的工作机构应该包含尽可能少的信息,可能只是您点击数据库的一些ID。 beanstalkd has a slew of client APIs and itself has a very basic API, think memcached. beanstalkd有一大堆客户端API,本身有一个非常基本的API,想想memcached。

We use beanstalkd to process all of our background jobs, I love it. 我们使用beanstalkd来处理我们所有的后台工作,我喜欢它。 Easy to use, its very fast. 易于使用,速度非常快。

There is no multithreading in PHP, however you can use fork. PHP中没有多线程,但是你可以使用fork。

php.net:pcntl-fork php.net:pcntl-fork

Or you could execute a system() command and start another process which is multithreaded. 或者您可以执行system()命令并启动另一个多线程进程。

can you implementing threading in javascript on the client side? 你可以在客户端的javascript中实现线程吗? seems to me i've seen a javascript library (from google perhaps?) that implements it. 在我看来,我已经看到了一个实现它的javascript库(来自谷歌?)。 google it and i'm sure you'll find something. 谷歌它,我相信你会找到一些东西。 i've never done it, but i know its possible. 我从来没有做过,但我知道它可能。 anyway, your client-side javascript could activate (ajax) a php script once for each item in separate threads. 无论如何,你的客户端javascript可以为单独的线程中的每个项激活(ajax)一个php脚本。 that might be easier than trying to do it all on the server side. 这可能比尝试在服务器端完成所有操作更容易。

-don -don

If you are running a high traffic PHP server you are INSANE if you do not use Alternative PHP Cache: http://php.net/manual/en/book.apc.php . 如果您正在运行高流量的PHP服务器,那么如果您不使用备用PHP缓存,那么您就是INSANEhttp//php.net/manual/en/book.apc.php You do not have to make code modifications to run APC. 您不必进行代码修改即可运行APC。

Another useful technique that can work along with APC is using the Smarty template system which allows you to cache output so that pages do not have to be rebuilt. 另一个可以与APC一起使用的有用技术是使用Smarty模板系统,它允许您缓存输出,以便不必重建页面。

To solve this problem, I've used two different products; 为了解决这个问题,我使用了两种不同的产品; Gearman and RabbitMQ. Gearman和RabbitMQ。

The benefit of putting your jobs into some sort of queuing software like Gearman or Rabbit is that you have multiple machines, they can all participate in processing items off the queue(s). 将您的工作放入某些排队软件(如Gearman或Rabbit)的好处是,您拥有多台计算机,他们都可以参与处理队列中的项目。

Gearman is easier to setup, so I'd suggest poking around with it a bit first. Gearman更容易设置,所以我建议稍微先探讨一下。 If you find you need something more heavy duty with queue robustness; 如果你发现你需要更重要的东西和队列稳健性; Look into RabbitMQ 看看RabbitMQ

You can use pcntl_fork() and family to fork a process - however you may need something like IPC to communicate back to the parent process that the child process (the one you fork'd) is finished. 您可以使用pcntl_fork()和family来分叉进程 - 但是您可能需要像IPC这样的东西来回传给子进程(您分叉的那个)完成的父进程。

You could have them write to shared memory, like via memcache or a DB. 您可以让它们写入共享内存,例如通过内存缓存或数据库。

You could also have the child process write the completed data to a file, that the parent process keeps checking - as each child process completes the file is created/written to/updated, and parent process can grab it, one at a time, and them throw them back to the callee/client. 您还可以让子进程将已完成的数据写入文件,父进程一直在检查 - 当每个子进程完成时,创建/写入/更新文件,父进程可以一次抓取一个,并且他们把它们扔回被叫者/客户端。

The parent's job is to control the queue, to make sure the same data isn't processed twice and also to sanity check the children (better kill that runaway process and start over...etc) 父母的工作是控制队列,确保相同的数据不被处理两次,并且还要检查孩子们的健全状况(更好地杀死那个失控过程并重新开始等等)

Something else to keep in mind - on windows platforms you are going to be severely limited - I dont even think you have access to pcntl_ unless you compiled PHP with support for it. 要记住的其他事项 - 在Windows平台上你将受到严重限制 - 我甚至认为你无法访问pcntl_,除非你编译PHP并支持它。

Also, can you cache the data once its been processed, or is it unique data every time? 此外,您可以在处理完数据后对其进行缓存,还是每次都是唯一数据? that would surely speed things up..? 这肯定会加快速度..?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM