[英]Regex - Find the match that is inside a match
如果字符串是
<li>Your browser may be missing a required plug-in contained in <a href="http://get.adobe.com/reader/">Adobe Acrobat Reader</a>. Please reload this page after installing the missing component.<br />If this error persists, you can also save a copy of <a href="test.pdf">
我写的正则表达式是
/href=.*?.pdf/
这导致捕获第一个“ href”并以“ .pdf”结尾。 我需要它以第二个href开头。 换句话说,它只应捕获以.pdf结尾的href
我应该如何使用正则表达式来解决这个问题?
您应该使用DOM而不是使用正则表达式来解析HTML或XML。 在PHP中,有DOMDocument
类:
$doc = new DOMDocument();
$doc->loadHTML('<li>Your browser may be missing a required plug-in contained in <a href="http://get.adobe.com/reader/">Adobe Acrobat Reader</a>. Please reload this page after installing the missing component.<br />If this error persists, you can also save a copy of <a href="http://www.police.vt.edu/VTPD_v2.1/crime_stats/crime_logs/data/VT_2011-01_Crime_Log.pdf">');
$links = $doc->getElementsByTagName('a');
foreach($links as $link) {
echo $link->getAttribute('href');
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.