[英]Improve parse html code with DOM with PHP
I created the following code to display a blank page , a piece of an external site , but I had to remove some nodes and each node needed to create a piece of code and it made almost unfeasible his maintenance if it was a big project. 我创建了以下代码来显示空白页,即一个外部站点,但是我不得不删除一些节点,并且每个节点都需要创建一段代码,如果这是一个大项目,那么维护他几乎是不可行的。
My doubts: 我的疑问:
Is there a way to put in a single stretch all we want to eliminate ( footer , header , headerContent , etc.)? 有没有一种方法可以将所有我们想消除的内容(页脚,页眉,headerContent等)放入?
Is there a smarter way to clean instead of deleting elements, just show what I want ( TABLE1 )? 有没有一种更聪明的方法来清除而不是删除元素,而只显示我想要的内容(表1)?
# Create a DOM parser object $dom = new DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTMLFile('http://www.sptrans.com.br/sac/solicitacoes.aspx'); $data = $dom -> getElementByid('TABELA1'); $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novidadeDestaque")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerLvl1")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerContent")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novo_menu")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "footer")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "header")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "pageNovidades")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } echo $dom->saveHTML(); ?> </body>
To create a short-code routine to eliminate desired elements you can use an array: 要创建一个短代码例程以消除所需的元素,可以使用数组:
$xpath = new DOMXPath($dom);
$idToDelete = [ 'novidadeDestaque', 'headerLvl1', ... ];
foreach( $idToDelete as $id )
{
foreach($xpath->query('//div[contains(attribute::id, "'.$id.'")]') as $e ) {
$e->parentNode->removeChild($e);
}
}
Please note that you don't need to create a new DOMXPath
object for each search: you can create it only once per DOMDocument
object. 请注意,您无需为每次搜索都创建一个新的
DOMXPath
对象:每个DOMDocument
对象只能创建一次。
To show only what you want, you can use this syntax: 要仅显示所需内容,可以使用以下语法:
$table = $dom->GetElementById( 'MyTable' );
echo $dom->saveHTML( $table );
To have a complete HTML with only desired table, you can create a new DOMDocument
and use importNode
to add your table: 要拥有仅包含所需表的完整HTML ,可以创建一个新的
DOMDocument
并使用importNode
添加表:
$src = new DOMDocument();
$dst = new DOMDocument();
$src->loadHTML( $html );
$dst->loadHTML( '<html><head><title>Untitled</title></head><body></body></html>' );
$table = $src->GetElementById( 'MyTable' );
$imported = $dst->importNode( $table );
$dst->getElementsByTagName( 'body' )->item(0)->appendChild( $imported );
$dst->saveHTML();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.