[英]Extract content from multiple pages of same website
I have this script to extract data from multiple pages of the same website.我有这个脚本可以从同一个网站的多个页面中提取数据。 There are some 120 pages.
大约有 120 页。
Here is the code I'm using to get for a single page.这是我用来获取单个页面的代码。
$html = file_get_contents('https://www.example.com/product?page=1');
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('div');
foreach ($links as $link){
file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}
How can I do it for multiple pages?如何为多个页面执行此操作? The links for that specific pages are incremental like the next page will be
https://www.example.com/product?page=2
and so on.该特定页面的链接是递增的,就像下一页将是
https://www.example.com/product?page=2
等等。 How can I do it without creating different files for each link?如何在不为每个链接创建不同文件的情况下做到这一点?
What about this :那这个呢 :
function extractContent($page)
{
$html = file_get_contents('https://www.example.com/product?page='.$page);
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('div');
foreach ($links as $link) {
// skip empty attributes
if (empty($link->getAttribute('data-product-name'))) {
continue;
}
file_put_contents('products.txt', $link->getAttribute('data-product-name') .PHP_EOL, FILE_APPEND);
}
}
for ($i=1; $i<=120; $i++) {
extractContent($i);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.