[英]How to get image url by page in PHP
This is my code : 这是我的代码:
<form method="POST">
<input name="link">
<button type="submit">></button>
</form>
<title>GET IMAGE URL</title>
<?php
if (!isset($_POST['link'])) exit();
$link = $_POST['link'];
$parse = explode('.html', $link);
echo '<div id="pin" style="float:center"><textarea class="text" cols="110" rows="50">';
for ($i = 1; $i <=5; $i++)
{
if ($i > 1)
$link = "$parse[0]-$i.html";
$get = file_get_contents($link);
if (preg_match_all('/src="(.*?)"/', $get, $matches))
{
foreach ($matches[1] as $content)
echo $content."\r\n";
}
}
echo '</textarea>';
The page I'm trying to get the img src has 10 to 15 page,so I want my code to get all the img url until the end of the page. 我要获取img src的页面有10到15页,所以我希望我的代码获取所有img URL,直到页面末尾。 How can I do that without the loop? 没有循环我该怎么办?
If I use: 如果我使用:
for ($i = 1; $i <=5; $i++)
this will get only 5 page img urls, but I want to make it get until the end. 这只会获得5页的img网址,但我想让它直到最后。 Then I don't need to edit the loop everytime I submit another URL with a different number of pages. 这样,我每次提交另一个具有不同页面数的URL时,都不需要编辑循环。
From this 由此
this will get only 5 page img urls, but I want to make it get until the end. 这只会获得5页的img网址,但我想让它直到最后。 Then I don't need to edit the loop everytime I submit another URL with a different number of pages. 这样,我每次提交另一个具有不同页面数的URL时,都不需要编辑循环。
I could understand that your problem is with dynamic number of pages.Your urls have a next page link at the bottom 我可以理解您的问题在于动态页面数。您的网址底部有一个下一页链接
下一页 下一页
Identify it and get your images in while loop 识别它并在while循环中获取图像
<?php
// Link given in form
$link = "http://www.xiumm.org/photos/XiuRen-17305.html";
$parse = explode('.html', $link);
$i=1;
// Intialize a boolean
$nextPageFound = true;
while($nextPageFound) {
// Construct URL Every time when nextPageFound
if ($i == 1) {
$url = "$parse[0].html";
echo "First Page<br><br>";
} else {
$url = "$parse[0]-$i.html";
}
// Getting URL Contents
$get = file_get_contents($url);
if (preg_match_all('/src="(.*?)"/', $get, $matches))
{
// echoing contents
foreach ($matches[1] as $content)
echo $content."<br>";
}
// check nextPageBtn if available
if (strpos($get, '"nextPageBtn"') !== false) {
$nextPageFound = true;
// increment +1
$i++;
echo "<br>Page $i<br><br>";
} else {
$nextPageFound = false;
echo "THE END";
}
}
?>
You should use an HTML/XML parser, like DOMDocument
, in combination with DOMXPath
( xpath is query language to query (X)HTML data structures): 您应该将HTML / XML解析器(如DOMDocument
与DOMXPath
结合使用( xpath是用于查询(X)HTML数据结构的查询语言):
// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );
// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all IMG elements that have a src attribute
$nodes = $xpath->query( '//img[@src]' );
// loop trough found IMG elements and echo their src attribute values
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->getAttribute( 'src' ) . PHP_EOL;
}
Regarding the xpath query //div[contains(@class,'pic_box')]//@src
, mentioned by @Enuma, in the comments: 关于@Enuma在注释中提到的xpath查询//div[contains(@class,'pic_box')]//@src
:
The resulting DOMNodeList
of that query will not contain DOMElement
objects, but DOMAttr
objects, because the query directly asks for attributes, not elements. 该查询的结果DOMNodeList
将不包含DOMElement
对象,而将包含DOMAttr
对象,因为该查询直接要求属性,而不是元素。 Since DOMAttr
represents an attribute and not an element, the method getAttribute()
does not exist. 由于DOMAttr
表示属性而不是元素,因此方法getAttribute()
不存在。 To get the value of the attribute you have to use the property DOMAttr->value
. 要获取属性的值,必须使用DOMAttr->value
属性。
So, we have to slightly alter the relevant part of our example code from above to: 因此,我们必须将示例代码的相关部分从上面稍微更改为:
// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->value . PHP_EOL;
}
Putting it all together, our example code then becomes: 放在一起,我们的示例代码变为:
// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );
// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all src attributes that are descendants of div.pic_box
$nodes = $xpath->query( '//div[contains(@class,'pic_box')]//@src' );
// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->value . PHP_EOL;
}
PS.: In order for DOMDocument
to be able to load remote files, I believe some php config setting may be required to be set, which I don't know off the top of my head, right now. PS .:为了使DOMDocument
能够加载远程文件,我认为可能需要设置一些php配置设置,而现在我还不知道这是什么。 But since it already appeared to be working for @Enuma, it's not actually relevant now. 但是由于它似乎已经在@Enuma中工作,所以现在实际上不相关了。 Perhaps I'll look them up later. 也许以后再查。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.