简体   繁体   中英

Crawler script php

I've grab a piece of script off here to crawl a website, put it up on my server and it works. The only issue is that if I try and crawl set the depth anything above 4 it doesn't work. I'm wondering if it due to the servers lack of resources or the code itself.

<?php

error_reporting(E_ALL); 

function crawl_page($url, $depth)
{
    static $seen = array();
    if (isset($seen[$url]) || $depth === 0) {
        return;
    }
    $seen[$url] = true;

    $dom = new DOMDocument('1.0');
    @$dom->loadHTMLFile($url);

    $anchors = $dom->getElementsByTagName('a');
    foreach ($anchors as $element) {
        $href = $element->getAttribute('href');
        if (0 !== strpos($href, 'http')) {
            $href = rtrim($url, '/') . '/' . ltrim($href, '/');
        }
        crawl_page($href, $depth - 1);
    }
    echo "URL:",$url,PHP_EOL;
    echo  "<br/>";
}
crawl_page("http://www.mangastream.com/", 2);
?>

EDIT:

I turned on the error reporting for the script and all I get is this

Error 324 (net::ERR_EMPTY_RESPONSE): Unknown error.

Try making sure you have all error messages on (display_errors, error_reporting). This should give you more insight as to why it's crashing.

Also, keep in mind that crawling is often illegal depending on what you're going to do with the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM