简体   繁体   English

使用PHP改进使用DOM解析html代码

[英]Improve parse html code with DOM with PHP

I created the following code to display a blank page , a piece of an external site , but I had to remove some nodes and each node needed to create a piece of code and it made almost unfeasible his maintenance if it was a big project. 我创建了以下代码来显示空白页,即一个外部站点,但是我不得不删除一些节点,并且每个节点都需要创建一段代码,如果这是一个大项目,那么维护他几乎是不可行的。

My doubts: 我的疑问:

  1. Is there a way to put in a single stretch all we want to eliminate ( footer , header , headerContent , etc.)? 有没有一种方法可以将所有我们想消除的内容(页脚,页眉,headerContent等)放入?

  2. Is there a smarter way to clean instead of deleting elements, just show what I want ( TABLE1 )? 有没有一种更聪明的方法来清除而不是删除元素,而只显示我想要的内容(表1)?

      # Create a DOM parser object $dom = new DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTMLFile('http://www.sptrans.com.br/sac/solicitacoes.aspx'); $data = $dom -> getElementByid('TABELA1'); $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novidadeDestaque")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerLvl1")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerContent")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novo_menu")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "footer")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "header")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "pageNovidades")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } echo $dom->saveHTML(); ?> </body> 

To create a short-code routine to eliminate desired elements you can use an array: 要创建一个短代码例程以消除所需的元素,可以使用数组:

$xpath = new DOMXPath($dom);
$idToDelete = [ 'novidadeDestaque', 'headerLvl1', ... ];

foreach( $idToDelete as $id )
{
    foreach($xpath->query('//div[contains(attribute::id, "'.$id.'")]') as $e ) {
        $e->parentNode->removeChild($e);
    }
}

Please note that you don't need to create a new DOMXPath object for each search: you can create it only once per DOMDocument object. 请注意,您无需为每次搜索都创建一个新的DOMXPath对象:每个DOMDocument对象只能创建一次。

To show only what you want, you can use this syntax: 仅显示所需内容,可以使用以下语法:

$table = $dom->GetElementById( 'MyTable' );
echo $dom->saveHTML( $table );

To have a complete HTML with only desired table, you can create a new DOMDocument and use importNode to add your table: 要拥有仅包含所需表的完整HTML ,可以创建一个新的DOMDocument并使用importNode添加表:

$src = new DOMDocument();
$dst = new DOMDocument();

$src->loadHTML( $html );
$dst->loadHTML( '<html><head><title>Untitled</title></head><body></body></html>' );

$table    = $src->GetElementById( 'MyTable' );
$imported = $dst->importNode( $table );

$dst->getElementsByTagName( 'body' )->item(0)->appendChild( $imported );

$dst->saveHTML();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM