从同一网站的多个页面中提取内容

Question

I have this script to extract data from multiple pages of the same website.我有这个脚本可以从同一个网站的多个页面中提取数据。 There are some 120 pages.大约有 120 页。

Here is the code I'm using to get for a single page.这是我用来获取单个页面的代码。

$html = file_get_contents('https://www.example.com/product?page=1');

$dom = new DOMDocument;

@$dom->loadHTML($html);

$links = $dom->getElementsByTagName('div');

foreach ($links as $link){
    file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}

How can I do it for multiple pages?如何为多个页面执行此操作？ The links for that specific pages are incremental like the next page will be https://www.example.com/product?page=2 and so on.该特定页面的链接是递增的，就像下一页将是https://www.example.com/product?page=2等等。 How can I do it without creating different files for each link?如何在不为每个链接创建不同文件的情况下做到这一点？

Answer 1

What about this :那这个呢：

function extractContent($page)
{
    $html = file_get_contents('https://www.example.com/product?page='.$page);
    $dom = new DOMDocument;
    @$dom->loadHTML($html);
    $links = $dom->getElementsByTagName('div');

    foreach ($links as $link) {
        // skip empty attributes
        if (empty($link->getAttribute('data-product-name'))) {
            continue;
        }
        file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
    }
}

for ($i=1; $i<=120; $i++) {
    extractContent($i);
}

从同一网站的多个页面中提取内容

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-10-17 09:04:04

从同一网站的多个页面中提取内容

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-10-17 09:04:04

解决方案1
1 已采纳 2018-10-17 09:04:04