简体   繁体   English

PHP中带有pthreads的多线程文件处理

[英]Multithreaded File Processing in PHP with pthreads

I'm trying to create a script that process a number of files simultanously, the rule is, each file can only be processed once, and the input file is deleted after it has been processed. 我正在尝试创建一个可以同时处理多个文件的脚本,规则是,每个文件只能处理一次,并且在处理完输入文件后将其删除。 I created this script : 我创建了这个脚本:

<?php

// Libraries for reading files
require_once "spooler.php";

// Configuration section ///////////////////////////////////////////////////////

$config["data"] = "data";
$config["threads"] = 20;
$config["timer"] = 1;

// Array to store currently processed files
$config["processed_files"] = array();

// Processing section //////////////////////////////////////////////////////////

$timer = 0;
$pool = new Pool($config["threads"], \ProcessingWorker::class);

while (true) {

    // Read a number of files from the data folder according to the number of thread
    $files = Spooler::read_spool_file($config["data"], $config["threads"]);
    foreach ($files as $file) {
        // Check if the file is already processed
        if (in_array($file, $config["processed_files"])) continue;
        // Submit the file to the worker
        echo "Submitting $file\n";
        $config["processed_files"][$file] = $file;
        $pool->submit(new ProcessingJob($config, $file));
    }

    sleep($config["timer"]);
    $timer++;
}

$pool->shutdown();

// Processing thread section ///////////////////////////////////////////////////

class ProcessingJob extends Stackable {

    private $config;
    private $file;

    public function __construct($config, $file)
    {
        $this->config = $config;
        $this->file = $file;
        $this->complete = false;
    }

    public function run()
    {
        echo "Processing $this->file\n";
        // Pretend we're doing something that takes time
        sleep(mt_rand(1, 10));
        file_put_contents("_LOG", $this->file."\n", FILE_APPEND);

        // Delete the file
        @unlink($this->file);
        // Remove the file from the currently processing list
        unset($this->config["processed_files"][$this->file]);
    }

}

class ProcessingWorker extends Worker {
    public function run() {}
}

However, this code doesn't work well, it doesn't process the same files twice, but instead sometimes it skip processing some files. 但是,此代码不能很好地工作,它不会两次处理相同的文件,而是有时会跳过处理某些文件。 Here's the file list it should be processed, but it only process these files . 这是应该处理的文件列表 ,但仅处理这些文件

Where am I doing it wrong? 我在哪里做错了?

Output to the log file isn't synchronized, it's highly likely that two threads are concurrently calling file_put_contents on the log file and so corrupting it's output. 输出到日志文件不同步,很有可能两个线程同时在日志文件上调用file_put_contents ,因此破坏了它的输出。

You should not write to a log file in this way. 您不应该以这种方式写入日志文件。

If $config['processed_files'] is intended to be manipulated by multiple contexts then it should be a thread safe structure descended from pthreads, not a plain PHP array. 如果$config['processed_files']打算由多个上下文操纵,则它应该是pthread派生的线程安全结构,而不是纯PHP数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM