[英]Can not extract href value from anchor tag
Trying getting the href
value for this HTML 尝试获取此HTML的
href
值
<a class="list-item clearfix" href="/en/rolex/submariner-date--id2334149.htm" id="watch-2334149" style="background-color: rgb(255, 255, 255);">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-image-click']);_gaq.push(['second._trackEvent','Click','search','watch-image-click']);" class="pic ">
<span style="position:absolute">
<img width="100" height="100" alt="Rolex Submariner Date" src="" class="photo">
</span>
</span>
<span class="disc">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-headline-click']);_gaq.push(['second._trackEvent','Click','search','watch-headline-click']);" class="watch-headline"><span class="underline">Rolex Submariner Date</span></span>
<span class="spec">
<span onmouseover="$('#infobox-title').text('Germany');$('#infobox-text').text('This dealer is from Augsburg, Germany.')" style="width: 21px;" class="flag">
<img width="16" height="16" alt="" src="http://cdn.chrono24.com/images/flags-icons/DE.png">
</span>
<span class="icon i-hasnostore"></span>
<span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="icon i-trusted"></span>
<span onmouseover="$('#infobox-title').text('Retailer recommendations');$('#infobox-text').text('This watch retailer is recommended on Chrono24 by 1 other watch retailers.')" class="i-buddies">
<span class="icon buddie-count">1</span>
<span class="icon i-star-blue"></span>
</span>
<span onmouseover="$('#infobox-title').text('Trusted Seller since 2004');$('#infobox-text').text('We have no knowledge about pending/unsolved disputes or complaints about this seller.')" class="trustedseller">
<script type="text/javascript">
// <![CDATA[
document.write('Trusted Seller since 2004');
// ]]>
</script>Trusted Seller since 2004
</span>
<span style="width: 2px;" class="icon"></span>
<span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="icon i-premium"></span>
<span onmouseover="$('#infobox-title').text('Premium Seller');$('#infobox-text').text('The Chrono24 Premium Seller Package is only available for Trusted Sellers who frequently use Chrono24.')" class="premiumseller">Premium</span>
</span>
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-desc-click']);_gaq.push(['second._trackEvent','Click','search','watch-desc-click']);" class="description">
Ref. No. 116610 LN; Steel; Automatic; Condition 0 (unworn); Year 2013; With Box; With Papers; Location: Germany, Augsburg; The current, the manufacturer's recommended retail price is 6800 Euro
</span>
<span class="availability">Availability: Available immediately</span>
</span>
<span class="pricebox">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-price-click']);_gaq.push(['second._trackEvent','Click','search','watch-price-click']);" class="amount price"><span class="large">$ 7,961</span>
</span>
<span class="buttonbox">
<span onclick="_gaq.push(['first._trackEvent','Click','search','watch-button-click']);_gaq.push(['second._trackEvent','Click','search','watch-button-click']);" class="button-blue">
<span>
Watch details
</span>
</span>
</span>
</span>
</a>
preg_match_all('#<a href="(.+)">#',$html,$urlarr);
This is not giving the href
value at all , Don't know what going wrong with this. 这根本没有给
href
值,不知道这是怎么回事。
Don't use Regular Expressions on HTML; 不要在HTML上使用正则表达式; HTML is not regular !
HTML不正常 !
You should take a look at SimpleXML and XPath, they are the perfect tooks for the job: http://php.net/manual/en/simplexmlelement.xpath.php 您应该看一下SimpleXML和XPath,它们是工作的完美选择: http : //php.net/manual/zh/simplexmlelement.xpath.php
Eg: 例如:
$xml = new SimpleXMLElement($html);
// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");
// You probably want the first one
$href = $links[0]["href"]
You should use domdocument instead if regexp: 如果使用regexp,则应改用domdocument:
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$link = $dom->getElementsByTagName("a");
$links = array();
for($i = 0; $i < $link->length; $i++) {
$links[] = $link->item($i)->getAttribute("href");
}
All the methods with the DOM as suggested should work. 建议的所有使用DOM的方法都可以使用。 If you want to use regex, you can try this:
如果要使用正则表达式,可以尝试以下操作:
preg_match_all('~<a (?>[^>h]++|\Bh|h(?!ref\b))*href\s*=\s*["\']?\K[^"\'>\s]++~i', $html, $matches);
If you want to match only href in a tags that have list-item clearfix
as class attribute value, you can do this: 如果只想将具有
list-item clearfix
作为类属性值的标签中的href匹配,则可以执行以下操作:
$pattern = <<<'LOD'
~
(?(DEFINE)
(?<class> \b class \s* = \s* (["']) list-item \s+ clearfix \g{-1} )
(?<href_value> [^"'\s>]++ )
(?<href_start> \b href \s*=\s* ["']? )
(?<href_end> ['"\s] )
(?<content> (?> [^>hc]++ | \B[hc] | h(?!ref\b) | c(?!lass\b) )* )
)
<a \s+
\g<content>
(?J)
(?>
\g<class> \g<content> \g<href_start> (?<href> \g<href_value> )
|
\g<href_start> (?<href> \g<href_value> ) \g<href_end> \g<content> \g<class>
)
~xi
LOD;
preg_match_all($pattern, $html, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
echo '<br>' . $match['href'];
}
Keep in mind that using XPath is much easier to do that: 请记住,使用XPath更容易做到这一点:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$hrefs = $xpath->query("//a[@class='list-item clearfix']/@href");
foreach($hrefs as $href) {
print_r($href->nodeValue);
}
It's a bad idea to use regular expressions to parse HTML (at least, in this case). 使用正则表达式解析HTML是一个坏主意(至少在这种情况下)。 Use a DOMParser such as SimpleHTMLDOM for this purpose:
为此,请使用诸如SimpleHTMLDOM之类的DOMParser:
It's easy as: 很简单,因为:
$html = str_get_html('...');
foreach($html->find('a') as $element)
echo $element->href;
Alternatively, you can load it from a file as well: 另外,您也可以从文件中加载它:
$html = file_get_html('...');
foreach($html->find('a') as $element)
echo $element->href;
This is also possible with the built-in DOM: 内置DOM也可以实现:
$dom = new DOMDocument();
$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a"); //all <a> tags
$urlArray = array();
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$urlArray[] = $href->getAttribute('href');
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.