简体   繁体   中英

How can i download and parse portion of web page?

I don't want to download the whole web page. It will take time and it needs lot of memory.

How can i download portion of that web page? Then i will parse that.

Suppose i need to download only the <div id="entryPageContent" class="cssBaseOne">...</div> . How can i do that?

You can't download a portion of a URL by "only this piece of HTML". HTTP only supports byte ranges for partial downloads and has no concept of HTML/XML document trees.

So you'll have to download the entire page, load it into a DOM parser , and then extract only the portion(s) you need.

eg

$html = file_get_contents('http://example.com/somepage.html');
$dom = new DOM();
$dom->loadHTML($html);
$div = $dom->getElementById('entryPageContent');

$content = $div->saveHTML();

Using this:

curl_setopt($ch, CURLOPT_RANGE, "0-10000");

will make cURL download only the first 10k bytes of the webpage. Also it will only work if the server side supports this. Many interpreted scripts (CGI, PHP, ...) ignore it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM