简体   繁体   English

从页面上的内容获取HREF值

[英]Getting HREF Values from content on page

I am getting data from a page that is formatted like this 我正在从这样格式化的页面获取数据

<span id="RANDOMINFO">
 <a href="/DEMO/RANDOMDATA">+</a>
 <span title="1">DATA I WANT HERE</span> 
<a href="https://URL.COM/">CLICK</a> 
<a href="https://URL.COM/">MORE RANDOM DATA</a>
</span>
<span id="RANDOMINFO">
 <a href="/DEMO/RANDOMDATA">+</a>
 <span title="2">DATA I WANT HERE</span> 
<a href="https://URL.COM/RANDOM">CLICK</a> 
<a href="https://URL.COM/RANDOM">MORE RANDOM DATA</a>
</span>

How can I get the href value from the page 如何从页面获取href值

Here is the code I have to get the data from the span ID but don't know how to do it for the href as there is no name or id 这是我必须从span ID获取数据的代码,但不知道如何为href进行操作,因为no name or id

       $doc = new DOMDocument();
        @$doc->loadHTML($html2);
                foreach($doc->getElementsByTagName('span') as $element ) 
                    { 
                        if (!empty($element->attributes->getNamedItem('id')->value)) 
                        { 
                        $filename = 'newpks/'.$f.'.txt';
                         $file = fopen($filename,"a");

                        $data = $element->attributes->getNamedItem('id')->value.PHP_EOL;
                        fwrite($file,$data);
                        fclose($file);
                        $i++;
                        $end = $start;
                        }
                    }

I assume you're only interested in links with the href attribute, and then we know the tags will be of type a . 我假设您只对带有href属性的链接感兴趣,然后我们知道标记的类型将为a This should sufficient (I haven't been able to test the code though). 这应该足够了(尽管我还无法测试代码)。

I optimized the code a bit, since the DOMNode class inherits from DOMElement you can use the hasAttribute and getAttribute instead. 我对代码进行了一些优化,因为DOMNode类是从DOMElement继承的,因此您可以使用hasAttributegetAttribute代替。

foreach($doc->getElementsByTagName('a') as $element ) { 
    if ($element->hasAttribute('href')) { 
        $href = $element->getAttribute('href');
        // Do your work here
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM