![](/img/trans.png)
[英]Choose multiple class name and get childnode inside that class with PHP DOMXpath
[英]DOMXpath query nested multiple class
我正在對具有以下結構的html
文件執行解析:
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
以下塊:
<div class="lstImv blackBd12"></div>
它覆蓋了目標textContents所在的其他標簽,它重復了幾次(在示例中,編輯后,我只輸入了2個)。
然后通過這段代碼:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
echo "<pre>";
print_r($span);
echo "</pre>";
}
?>
我得到2個具有其值的對象:
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
)
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
)
所以我的方式是:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
echo "Key 7 : ".$span->textContent."<br/>";
}
?>
我以這種方式獲取數據:
Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 :
Key 2 :
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9
換句話說,它正在遍歷鍵,但是我希望鍵按順序出現(K1,k2,...,k7,k1,k2,...,k7),而不是以( k1,k1,k2,k2 ...,k7,k7)。
對不起,我的英語不好,我還是會好的。
這是我得到的解決方案:
<?php
$html = <<<HTML
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
HTML;
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$items = $xpath->query('//div[@class="lstImv blackBd12"]');
for($i = 0; $i < $items->length; $i++)
{
$status = $xpath->query('//strong[@class="imvFse emp-fase"]');
echo "Value :".$status->item($i)->nodeValue."<br/>";
$titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
echo "Value :".$titulo->item($i)->nodeValue."<br/>";
$titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
echo "Value :".$titulo2->item($i)->nodeValue."<br/>";
$valor = $xpath->query('//em[@class="emp-valor-apartir"]');
echo "Value :".$valor->item($i)->nodeValue."<br/>";
$valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
echo "Value :".$valor2->item($i)->nodeValue."<br/>";
$dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
echo "Value :".$dorm->item($i)->nodeValue."<br/>";
$tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
echo "Value :".$tam->item($i)->nodeValue."<br/>";
}
?>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.