如何在PHP中按页面获取图像URL

Question

This is my code : 这是我的代码：

<form method="POST">
    <input name="link">
    <button type="submit">></button>
</form>
<title>GET IMAGE URL</title>
<?php 
if (!isset($_POST['link'])) exit();
$link = $_POST['link'];
$parse = explode('.html', $link);
echo '<div id="pin" style="float:center"><textarea class="text" cols="110" rows="50">';
for ($i = 1; $i <=5; $i++)
{
    if ($i > 1)
    $link = "$parse[0]-$i.html";
    $get = file_get_contents($link);
    if (preg_match_all('/src="(.*?)"/', $get, $matches))
    {
        foreach ($matches[1] as $content)
        echo $content."\r\n";
    }
}
echo '</textarea>';

The page I'm trying to get the img src has 10 to 15 page,so I want my code to get all the img url until the end of the page. 我要获取img src的页面有10到15页，所以我希望我的代码获取所有img URL，直到页面末尾。 How can I do that without the loop? 没有循环我该怎么办？

If I use: 如果我使用：

for ($i = 1; $i <=5; $i++)

this will get only 5 page img urls, but I want to make it get until the end. 这只会获得5页的img网址，但我想让它直到最后。 Then I don't need to edit the loop everytime I submit another URL with a different number of pages. 这样，我每次提交另一个具有不同页面数的URL时，都不需要编辑循环。

Answer 1

From this 由此

this will get only 5 page img urls, but I want to make it get until the end. 这只会获得5页的img网址，但我想让它直到最后。 Then I don't need to edit the loop everytime I submit another URL with a different number of pages. 这样，我每次提交另一个具有不同页面数的URL时，都不需要编辑循环。

I could understand that your problem is with dynamic number of pages.Your urls have a next page link at the bottom 我可以理解您的问题在于动态页面数。您的网址底部有一个下一页链接

Identify it and get your images in while loop 识别它并在while循环中获取图像

<?php

// Link given in form
$link = "http://www.xiumm.org/photos/XiuRen-17305.html";
$parse = explode('.html', $link);
$i=1;
// Intialize a boolean 
$nextPageFound = true;


while($nextPageFound) {
    // Construct URL Every time when nextPageFound
    if ($i == 1) {
        $url = "$parse[0].html";
        echo "First Page<br><br>";
     } else {
        $url = "$parse[0]-$i.html";
       }

    // Getting URL Contents
    $get = file_get_contents($url);
    if (preg_match_all('/src="(.*?)"/', $get, $matches))
    {
    // echoing contents
    foreach ($matches[1] as $content)
    echo $content."<br>";
    }
    // check nextPageBtn if available 
    if (strpos($get, '"nextPageBtn"') !== false) {
     $nextPageFound = true;
     // increment +1
     $i++;
    echo "<br>Page $i<br><br>";
    } else {
     $nextPageFound = false;
     echo "THE END";
    }

}
?>

Answer 2

You should use an HTML/XML parser, like DOMDocument , in combination with DOMXPath ( xpath is query language to query (X)HTML data structures): 您应该将HTML / XML解析器（如DOMDocument与DOMXPath结合使用（ xpath是用于查询（X）HTML数据结构的查询语言）：

// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );

// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all IMG elements that have a src attribute
$nodes = $xpath->query( '//img[@src]' );

// loop trough found IMG elements and echo their src attribute values
for( $i = 0; $i < $nodes->length; $i++ ) {
  echo $nodes->item( $i )->getAttribute( 'src' ) . PHP_EOL;
}

Regarding the xpath query //div[contains(@class,'pic_box')]//@src , mentioned by @Enuma, in the comments: 关于@Enuma在注释中提到的xpath查询//div[contains(@class,'pic_box')]//@src ：

The resulting DOMNodeList of that query will not contain DOMElement objects, but DOMAttr objects, because the query directly asks for attributes, not elements. 该查询的结果DOMNodeList将不包含DOMElement对象，而将包含DOMAttr对象，因为该查询直接要求属性，而不是元素。 Since DOMAttr represents an attribute and not an element, the method getAttribute() does not exist. 由于DOMAttr表示属性而不是元素，因此方法getAttribute()不存在。 To get the value of the attribute you have to use the property DOMAttr->value . 要获取属性的值，必须使用DOMAttr->value属性。

So, we have to slightly alter the relevant part of our example code from above to: 因此，我们必须将示例代码的相关部分从上面稍微更改为：

// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
  echo $nodes->item( $i )->value . PHP_EOL;
}

Putting it all together, our example code then becomes: 放在一起，我们的示例代码变为：

// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );

// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all src attributes that are descendants of div.pic_box 
$nodes = $xpath->query( '//div[contains(@class,'pic_box')]//@src' );

// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
  echo $nodes->item( $i )->value . PHP_EOL;
}

^{PS.: In order for DOMDocument to be able to load remote files, I believe some php config setting may be required to be set, which I don't know off the top of my head, right now.} ^{PS .：为了使DOMDocument能够加载远程文件，我认为可能需要设置一些php配置设置，而现在我还不知道这是什么。} ^{But since it already appeared to be working for @Enuma, it's not actually relevant now.} ^{但是由于它似乎已经在@Enuma中工作，所以现在实际上不相关了。} ^{Perhaps I'll look them up later.} ^{也许以后再查。}

如何在PHP中按页面获取图像URL

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-05-16 08:22:31

解决方案2
0 2017-05-16 08:05:38

如何在PHP中按页面获取图像URL

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-05-16 08:22:31

解决方案2 0 2017-05-16 08:05:38

解决方案1
1 已采纳 2017-05-16 08:22:31

解决方案2
0 2017-05-16 08:05:38