如何使循环更改网址

Question

Basically I'm trying to parse IMDB ID from the urls given. 基本上，我试图从给定的URL解析IMDB ID。 Trying to make a loop to change the page number and continue scraping for IMDB TTs. 尝试循环更改页码，然后继续抓取IMDB TT。

I'm expecting variable $page to increment by 1, so the $url will change and the foreach function in every loop will receive a new url and start scraping again. 我期望变量$ page增加1，因此$ url将更改，每个循环中的foreach函数将收到一个新的url并再次开始抓取。

But the problem is: The loop only parsing one page unlimited times, the page number is not increasing by 1. 但是问题是：循环只解析一页没有限制的次数，页数没有增加1。

   $url   = 'http://www.imdb.com/search/title?genres=animation&page='.$page.''; # this URL

for ($page = 1; $page <= 5 ; $page++) {

foreach((new DOMXpath(@DOMDocument::loadHTMLFile($url)))->query($expr) as $obj)
    preg_match($regex, $obj->value, $matches)
      && $ids[$matches[$match]] = 0;
    ;
$ids = array_keys($ids);


    print implode("<br /> ", $ids);

}

Example: http://surveygun.com/tt.php 示例： http ： //surveygun.com/tt.php

Answer 1

You can try something like this, change $i <= num to how every many pages you want to loop through. 您可以尝试执行以下操作，将$ i <= num更改为要循环浏览的页面数。

for( $i= 1 ; $i <= 165 ; $i++ ){
  $url   = 'http://www.imdb.com/search/title?genres=animation&page='.$i.'';

  // some code here

  sleep(2);
}

UPDATE(no dupes): 更新（不重复）：

<?php
 for( $i= 1 ; $i <= 5 ; $i++ ){
 $url = "http://www.imdb.com/search/title?genres=animation&page=$i";
 $page = file_get_contents($url);
   preg_match_all("/id=\"sb_(tt\d{7})/", $page, $idinfo, PREG_SET_ORDER);
   foreach($idinfo as $idnumber){
   $idnumber = $idnumber[1];
   echo $idnumber.'<br>';
 }}
?>

You might consider putting a sleep in between loops as a polite measure ie sleep(2); 您可以考虑将循环之间的睡眠作为一种礼貌的措施，例如sleep（2）; this would put it to sleep for 2 secs. 这会使它进入睡眠状态2秒钟。

如何使循环更改网址

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-03-02 18:32:23

如何使循环更改网址

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-03-02 18:32:23

解决方案1
0 已采纳 2018-03-02 18:32:23