简体   繁体   English

捕获没有图像链接的PHP链接

[英]Capture PHP links without image links

$url = 'http://www.test.com/';
$dom = new DOMDocument;
@$dom->loadHTMLFile($url);

$links = $dom->getElementsByTagName('a');
foreach ($links as $link) {

I am currently using the above script the capture links on a page, however what I found was there are always duplicate links. 我目前正在使用上述脚本在页面上捕获链接,但是我发现总是存在重复的链接。 On the page, there is a picture which is linked, followed by a text link which goes to the same link. 在页面上,有链接的图片,然后是指向同一链接的文本链接。 Is there an easy way to capture just the text link, not the image link? 是否有一种简单的方法可以仅捕获文本链接而不捕获图像链接?

As I was saying, I might take the approach of cleaning up the dupes in my result set. 正如我所说的,我可能会采用清理结果集中的重复对象的方法。 Not sure on what you are scraping but what if the link is only used with an image? 不确定要抓取的内容,但是如果链接与图像一起使用怎么办?

You could even count the occurrences. 您甚至可以计算发生的次数。

$url = 'http://www.test.com/';
$dom = new DOMDocument;
@$dom->loadHTMLFile($url);

$links = $dom->getElementsByTagName('a');
$distinctLinks = [];
foreach ($links as $link) {
    $distinctLinks[$link] = (int) $distinctLinks[$link] + 1;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM