简体   繁体   English

如何使循环更改网址

[英]How to make the loop to change the url

Basically I'm trying to parse IMDB ID from the urls given. 基本上,我试图从给定的URL解析IMDB ID。 Trying to make a loop to change the page number and continue scraping for IMDB TTs. 尝试循环更改页码,然后继续抓取IMDB TT。

I'm expecting variable $page to increment by 1, so the $url will change and the foreach function in every loop will receive a new url and start scraping again. 我期望变量$ page增加1,因此$ url将更改,每个循环中的foreach函数将收到一个新的url并再次开始抓取。

But the problem is: The loop only parsing one page unlimited times, the page number is not increasing by 1. 但是问题是:循环只解析一页没有限制的次数,页数没有增加1。

   $url   = 'http://www.imdb.com/search/title?genres=animation&page='.$page.''; # this URL

for ($page = 1; $page <= 5 ; $page++) {

foreach((new DOMXpath(@DOMDocument::loadHTMLFile($url)))->query($expr) as $obj)
    preg_match($regex, $obj->value, $matches)
      && $ids[$matches[$match]] = 0;
    ;
$ids = array_keys($ids);


    print implode("<br /> ", $ids);

}

Example: http://surveygun.com/tt.php 示例: http//surveygun.com/tt.php

You can try something like this, change $i <= num to how every many pages you want to loop through. 您可以尝试执行以下操作,将$ i <= num更改为要循环浏览的页面数。

for( $i= 1 ; $i <= 165 ; $i++ ){
  $url   = 'http://www.imdb.com/search/title?genres=animation&page='.$i.'';

  // some code here

  sleep(2);
}

UPDATE(no dupes): 更新(不重复):

<?php
 for( $i= 1 ; $i <= 5 ; $i++ ){
 $url = "http://www.imdb.com/search/title?genres=animation&page=$i";
 $page = file_get_contents($url);
   preg_match_all("/id=\"sb_(tt\d{7})/", $page, $idinfo, PREG_SET_ORDER);
   foreach($idinfo as $idnumber){
   $idnumber = $idnumber[1];
   echo $idnumber.'<br>';
 }}
?>

You might consider putting a sleep in between loops as a polite measure ie sleep(2); 您可以考虑将循环之间的睡眠作为一种礼貌的措施,例如sleep(2); this would put it to sleep for 2 secs. 这会使它进入睡眠状态2秒钟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM