![](/img/trans.png)
[英]Look inside pattern if parent pattern matches and share chars between patterns
[英]Search for multiple patterns inside a pattern
我正在使用正則表達式從網站中提取數據,但現在發現了一個問題。
這是我要解析的原始HTML的一部分。 我想提取“ descuentos-”之后的文本以及“ <a href>”之后的城市。
<div id="cities2_2">
<a href = "http://website.com/descuentos-espana/">Badajoz</a>
<a href = "http://website.com/descuentos-espana/">Badalona</a>
<a href = "http://website.com/descuentos-barcelona/">Barcelona</a>
<a href = "http://website.com/descuentos-bilbao/">Bilbao</a>
<a href = "http://website.com/descuentos-espana/">Burgos</a>
</div>
</div>
<div class="capa_cities" onmouseover="act_formato(3, 2);"
onmouseout="desact_formato(3, 2);">
<h2 id="title_city3_2">C</h2>
<div id="cities3_2">
<a href = "http://website.com/descuentos-espana/">Cáceres</a>
<a href = "http://website.com/descuentos-cadiz/">Cádiz</a>
<a href = "http://website.com/descuentos-espana/">Cartagena</a>
<a href = "http://website.com/descuentos-espana/">Castellón</a>
<a href = "http://website.com/descuentos-espana/">Ceuta</a>
<a href = "http://website.com/descuentos-espana/">Ciudad Real</a>
<a href = "http://website.com/descuentos-cordoba/">Córdoba</a>
<a href = "http://website.com/descuentos-espana/">Cuenca</a>
我可以尋找<a HREF =“http://website.com/descuentos- (.*)"> ,但也有其他匹配在網站上的圖案。 所以我現在有這個模式:
#<div id="cities[0-9]+_2">(<a href = "http://website.com/descuentos-(.*?)/">(.*?)</a>)*#
我想讓它遞歸。 我的意思是:每個“<a HREF =” http://website.com/descuentos- (.* )/"> </A>”發現,搜索內的兩個小圖案(*)。
有沒有辦法在正則表達式中實現此目的,或者我必須通過preg_match_all重新處理它?
選項1:快速方法:是,使用preg_match_all()
preg_match_all('#<a href = "http://website.com/descuentos-(.*?)/">.*?</a>#', $str, $matches);
echo "<pre>";
print_r($matches);
echo "</pre>";
收益:
Array
(
[0] => Array
(
[0] => Badajoz
[1] => Badalona
[2] => Barcelona
[3] => Bilbao
[4] => Burgos
[5] => Cáceres
[6] => Cádiz
[7] => Cartagena
[8] => Castellón
[9] => Ceuta
[10] => Ciudad Real
[11] => Córdoba
[12] => Cuenca
)
[1] => Array
(
[0] => espana
[1] => espana
[2] => barcelona
[3] => bilbao
[4] => espana
[5] => espana
[6] => cadiz
[7] => espana
[8] => espana
[9] => espana
[10] => espana
[11] => cordoba
[12] => espana
)
[2] => Array
(
[0] => Badajoz
[1] => Badalona
[2] => Barcelona
[3] => Bilbao
[4] => Burgos
[5] => Cáceres
[6] => Cádiz
[7] => Cartagena
[8] => Castellón
[9] => Ceuta
[10] => Ciudad Real
[11] => Córdoba
[12] => Cuenca
)
)
Time elapsed: 0.000104904174805
選項2: DOM解析器 :($ str是您的文本);
$dom = new DomDocument();
$dom->loadHTML($str);
$links = $dom->getElementsByTagName('a');
foreach($links as $link){
$href = $link->getAttribute('href');
echo $href." ### ";//prints the href
preg_match('#descuentos-(.*)/#', $href, $match);
echo $link->nodeValue." - ".$match[1]."<br/>";
}
輸出(添加utf-8標頭以查看正確的字符):
http://website.com/descuentos-espana/ ### Badajoz - espana
http://website.com/descuentos-espana/ ### Badalona - espana
http://website.com/descuentos-barcelona/ ### Barcelona - barcelona
http://website.com/descuentos-bilbao/ ### Bilbao - bilbao
http://website.com/descuentos-espana/ ### Burgos - espana
http://website.com/descuentos-espana/ ### Cáceres - espana
http://website.com/descuentos-cadiz/ ### Cádiz - cadiz
http://website.com/descuentos-espana/ ### Cartagena - espana
http://website.com/descuentos-espana/ ### Castellón - espana
http://website.com/descuentos-espana/ ### Ceuta - espana
http://website.com/descuentos-espana/ ### Ciudad Real - espana
http://website.com/descuentos-cordoba/ ### Córdoba - cordoba
http://website.com/descuentos-espana/ ### Cuenca - espana
Time elapsed: 0.000319004058838
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.