简体   繁体   English

用PHP多线程处理for语句

[英]Multi-threading a for statement with php

I'm using this following function to check if images exist at their location. 我正在使用以下功能检查图像是否存在于其位置。 Each time the script runs it load about 40 - 50 urls and so its taking long time to load the page. 每次脚本运行时,它都会加载大约40-50个网址,因此加载页面需要很长时间。 I was thinking of using threading for the "for statement" (at the end of the script) but couldn't find many examples on how to do that. 我当时在考虑为“ for语句”使用线程(在脚本末尾),但是找不到很多有关如何执行此操作的示例。 I'm not very familiar with multi-threading with php but i found an example here using popen. 我对php的多线程不是很熟悉,但是我在这里找到了使用popen的示例。

My script: 我的剧本:

function get_image_dim($sURL) {

  try {
    $hSock = @ fopen($sURL, 'rb');
    if ($hSock) {
      while(!feof($hSock)) {
        $vData = fread($hSock, 300);
        break;
      }
      fclose($hSock);
      if (strpos(' ' . $vData, 'JFIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('H*',$vData);        
        $sBytes = $asResult[1];
        $width = 0;
        $height = 0;
        $hex_width = '';
        $hex_height = '';
        if (strstr($sBytes, 'ffc2')) {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc2') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc2') + 14, 4);
        } else {
          $hex_height = substr($sBytes, strpos($sBytes, 'ffc0') + 10, 4);
          $hex_width = substr($sBytes, strpos($sBytes, 'ffc0') + 14, 4);
        }
        $width = hexdec($hex_width);
        $height = hexdec($hex_height);
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'GIF')>0) {
        $vData = substr($vData, 0, 300);
        $asResult = unpack('h*',$vData);
        $sBytes = $asResult[1];
        $sBytesH = substr($sBytes, 16, 4);
        $height = hexdec(strrev($sBytesH));
        $sBytesW = substr($sBytes, 12, 4);
        $width = hexdec(strrev($sBytesW));
        return array('width' => $width, 'height' => $height);
      } elseif (strpos(' ' . $vData, 'PNG')>0) {
        $vDataH = substr($vData, 22, 4);
        $asResult = unpack('n',$vDataH);
        $height = $asResult[1];        
        $vDataW = substr($vData, 18, 4);
        $asResult = unpack('n',$vDataW);
        $width = $asResult[1];        
        return array('width' => $width, 'height' => $height);
      }
    }
  } catch (Exception $e) {}
  return FALSE;
}

for($y=0;$y<= ($image_count-1);$y++){
$dim = get_image_dim($images[$y]);
    if (empty($dim)) {
    echo $images[$y];
    unset($images[$y]);
    }
}
$images = array_values($images);

The popen example i found was: 我发现的popen示例是:

for ($i=0; $i<10; $i++) {
    // open ten processes
    for ($j=0; $j<10; $j++) {
        $pipe[$j] = popen('script.php', 'w');
    }

    // wait for them to finish
    for ($j=0; $j<10; ++$j) {
        pclose($pipe[$j]);
    }
}

I'm not sure which part of my code has to go in the script.php? 我不确定我的代码的哪一部分必须放在script.php中? I tried moving the whole script but that didn't work? 我尝试移动整个脚本,但这没用吗?

Any ideas on how can i implement this or if there is a better way to multi thread it? 关于如何实现这一点或是否有更好的多线程方法的任何想法? Thanks. 谢谢。

PHP does not have multi-threading natively. PHP本机没有多线程。 You can do it with pthreads , but having a little experience there, I can say with assurance that that is too much for your needs. 您可以使用pthreads来做到这一点,但是在那里有一点经验,我可以肯定地说,这对于您的需求来说太多了。

Your best bet will be to use curl , you can initiate multiple requests with curl_multi_init . 最好的选择是使用curl ,您可以使用curl_multi_init发起多个请求。 Based off the example on PHP.net, the following may work for your needs: 根据PHP.net上的示例 ,以下内容可能会满足您的需求:

function curl_multi_callback(Array $urls, $callback, $cache_dir = NULL, $age = 600) {
    $return = array();

    $conn = array();

    $max_age = time()-intval($age);

    $mh = curl_multi_init();

    if(is_dir($cache_dir)) {
        foreach($urls as $i => $url) {
            $cache_path = $cache_dir.DIRECTORY_SEPARATOR.sha1($url).'.ser';
            if(file_exists($cache_path)) {
                $stat = stat($cache_path);
                if($stat['atime'] > $max_age) {
                    $return[$i] = unserialize(file_get_contents($cache_path));
                    unset($urls[$i]);
                } else {
                    unlink($cache_path);
                }
            }
        }
    }

    foreach ($urls as $i => $url) {
        $conn[$i] = curl_init($url);
        curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);
        curl_multi_add_handle($mh, $conn[$i]);
    }

    do {
        $status = curl_multi_exec($mh, $active);
        // Keep attempting to get info so long as we get info
        while (($info = curl_multi_info_read($mh)) !== FALSE) {
            // We received information from Multi
            if (false !== $info) {
                //  The connection was successful
                $handle = $info['handle'];
                // Find the index of the connection in `conn`
                $i = array_search($handle, $conn);
                if($info['result'] === CURLE_OK) {
                    // If we found an index and that index is set in the `urls` array
                    if(false !== $i && isset($urls[$i])) {
                        $content = curl_multi_getcontent($handle);
                        $return[$i] = $data = array(
                            'url'     => $urls[$i],
                            'content' => $content,
                            'parsed'  => call_user_func($callback, $content, $urls[$i]),
                        );
                        if(is_dir($cache_dir)) {
                            file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                        }
                    }
                } else {
                    // Handle failures how you will
                }
                // Close, even if a failure
                curl_multi_remove_handle($mh, $handle);
                unset($conn[$i]);
            }
        }
    } while ($status === CURLM_CALL_MULTI_PERFORM || $active);


    // Cleanup and resolve any remaining connections (unlikely)
    if(!empty($conn)) {
        foreach ($conn as $i => $handle) {
            if(isset($urls[$i])) {
                $content = curl_multi_getcontent($handle);
                $return[$i] = $data = array(
                    'url'     => $urls[$i],
                    'content' => $content,
                    'parsed'  => call_user_func($callback, $content, $urls[$i]),
                );
                if(is_dir($cache_dir)) {
                    file_put_contents($cache_dir.DIRECTORY_SEPARATOR.sha1($urls[$i]).'.ser', serialize($data));
                }
            }
            curl_multi_remove_handle($mh, $handle);
            unset($conn[$i]);
        }
    }

    curl_multi_close($mh);

    return $return;
}

$return = curl_multi_callback($urls, function($data, $url) {
    echo "got $url\n";
    return array('some stuff');
}, '/tmp', 30);

//print_r($return);
/*
$url_dims = array(
    'url'     => 'http://www......',
    'content' => raw content
    'parsed'  => return of get_image_dim
)
*/

Just restructure your original function get_image_dim to consume the raw data and output whatever you are looking for. 只需重构原始函数get_image_dim即可使用原始数据并输出您想要的任何内容。

This is not a complete function, there may be errors, or idiosyncrasies you need to resolve, but it should serve as a good starting point. 这不是一个完整的功能,可能存在错误或需要解决的特质,但这应作为一个良好的起点。

Updated to include caching. 更新以包括缓存。 This changed a test I was running on 18 URLS from 1 second, to .007 seconds (with cache hits). 这将我在18个URL上运行的测试从1秒更改为.007秒(带有缓存命中)。

Note: you may want to not cache the full request contents, as I did, and just cache the url and the parsed data. 注意:您可能不希望像我一样缓存全部请求内容,而只缓存url和已解析的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM