[英]How to make the loop to change the url
Basically I'm trying to parse IMDB ID from the urls given. 基本上,我试图从给定的URL解析IMDB ID。 Trying to make a loop to change the page number and continue scraping for IMDB TTs.
尝试循环更改页码,然后继续抓取IMDB TT。
I'm expecting variable $page to increment by 1, so the $url will change and the foreach function in every loop will receive a new url and start scraping again. 我期望变量$ page增加1,因此$ url将更改,每个循环中的foreach函数将收到一个新的url并再次开始抓取。
But the problem is: The loop only parsing one page unlimited times, the page number is not increasing by 1. 但是问题是:循环只解析一页没有限制的次数,页数没有增加1。
$url = 'http://www.imdb.com/search/title?genres=animation&page='.$page.''; # this URL
for ($page = 1; $page <= 5 ; $page++) {
foreach((new DOMXpath(@DOMDocument::loadHTMLFile($url)))->query($expr) as $obj)
preg_match($regex, $obj->value, $matches)
&& $ids[$matches[$match]] = 0;
;
$ids = array_keys($ids);
print implode("<br /> ", $ids);
}
Example: http://surveygun.com/tt.php 示例: http : //surveygun.com/tt.php
You can try something like this, change $i <= num to how every many pages you want to loop through. 您可以尝试执行以下操作,将$ i <= num更改为要循环浏览的页面数。
for( $i= 1 ; $i <= 165 ; $i++ ){
$url = 'http://www.imdb.com/search/title?genres=animation&page='.$i.'';
// some code here
sleep(2);
}
UPDATE(no dupes): 更新(不重复):
<?php
for( $i= 1 ; $i <= 5 ; $i++ ){
$url = "http://www.imdb.com/search/title?genres=animation&page=$i";
$page = file_get_contents($url);
preg_match_all("/id=\"sb_(tt\d{7})/", $page, $idinfo, PREG_SET_ORDER);
foreach($idinfo as $idnumber){
$idnumber = $idnumber[1];
echo $idnumber.'<br>';
}}
?>
You might consider putting a sleep in between loops as a polite measure ie sleep(2); 您可以考虑将循环之间的睡眠作为一种礼貌的措施,例如sleep(2); this would put it to sleep for 2 secs.
这会使它进入睡眠状态2秒钟。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.