简体   繁体   English

用DOMDocument获取DomXPath <img> 类别网址

[英]DomXPath with DOMDocument to get <img> Class URL

I am writing a little scraper script that will find the image URL that has a particular class name. 我正在编写一个小的搜寻器脚本,该脚本将找到具有特定类名的图像URL。 I know that my cURL and DOMDocument is functioning okay, and even the DomXPath really (as far as I can tell, there are no errors) But I am struggling to work out how to get the URL of the xpath query results. 我知道我的cURL和DOMDocument可以正常运行,甚至DomXPath都可以正常运行(据我所知,没有错误),但是我正在努力研究如何获取xpath查询结果的URL。

My code so far: 到目前为止,我的代码:

$dom = new DOMDocument();
@$dom->loadHTML($x);

$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="productImage"]');


var_dump($div);
echo $div->item(0);

If I var_dump($x) the page outputs no problem. 如果我var_dump($ x)页面输出没有问题。 So the CURL is working fine. 因此,CURL运行正常。 But I do not know how to get the data that is contained in the $div. 但是我不知道如何获取$ div中包含的数据。 I am trying to find an Image with a class of 'productImage' which looks like: 我正在尝试查找带有'productImage'类的图像,如下所示:

<img src="/uploads/5W/yP/5WyPP4l7Z-jmZRzu_MJ6zg/1077-d.jpg" border="1" alt="Album" class="productImage">

I want the source of that image tag. 我想要该图像标签的来源。

Any suggestions? 有什么建议么?

$dom = new DOMDocument();
$dom->loadHTML($x);

$xpath = new DomXpath($dom);
$imgs  = $xpath->query('//*[@class="productImage"]');

foreach($imgs as $img)
{
    echo 'ImgSrc: ' . $img->getAttribute('src') .'<br />' . PHP_EOL;
}

Try that... 试试看...

== EDIT: Additional Info == ==编辑:其他信息==

The reason I use a loop here is because you may find more than one img. 我在这里使用循环的原因是,您可能会发现多个img。 If you know there is only one element (or you want the first dom node found) you can access the elelement from the domnodelist via the item method of domnodelist - like so: 如果您知道只有一个元素(或者您想找到第一个dom节点),则可以通过domnodelist的item方法从domnodelist中访问元素-像这样:

$dom = new DOMDocument();
$dom->loadHTML($x);

$xpath = new DomXpath($dom);
$img   = $xpath->query('//*[@class="productImage"]');

echo 'ImgSrc: ' . $img->item(0)->getAttribute('src') .'<br />' . PHP_EOL;

You don't actually need to use XPath here, because it seems that you're just after images and that can be done by using DOMDocument::getElementsByTagName() , followed by a simple filter: 实际上,您实际上不需要在这里使用XPath,因为似乎您只是在使用图像,并且可以通过使用DOMDocument::getElementsByTagName()以及一个简单的过滤器来完成:

foreach ($dom->getElementsByTagName('img') as $image) {
    $class = $image->getAttribute('class');
    if (strpos(" $class ", " productImage ") !== false) {
        $url = $image->getAttribute('src');
        // do stuff
    }
}

Then, you can get the src attribute by using DOMElement::getAttribute() : 然后,您可以使用DOMElement::getAttribute()获得src属性:

echo $image->getAttribute('src');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM