简体   繁体   中英

Improve parse html code with DOM with PHP

I created the following code to display a blank page , a piece of an external site , but I had to remove some nodes and each node needed to create a piece of code and it made almost unfeasible his maintenance if it was a big project.

My doubts:

  1. Is there a way to put in a single stretch all we want to eliminate ( footer , header , headerContent , etc.)?

  2. Is there a smarter way to clean instead of deleting elements, just show what I want ( TABLE1 )?

      # Create a DOM parser object $dom = new DOMDocument(); libxml_use_internal_errors(true); $dom->loadHTMLFile('http://www.sptrans.com.br/sac/solicitacoes.aspx'); $data = $dom -> getElementByid('TABELA1'); $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novidadeDestaque")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerLvl1")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "headerContent")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "novo_menu")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "footer")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "header")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } $xpath = new DOMXPath($dom); foreach($xpath->query('//div[contains(attribute::id, "pageNovidades")]') as $e ) { // Delete this node $e->parentNode->removeChild($e); } echo $dom->saveHTML(); ?> </body> 

To create a short-code routine to eliminate desired elements you can use an array:

$xpath = new DOMXPath($dom);
$idToDelete = [ 'novidadeDestaque', 'headerLvl1', ... ];

foreach( $idToDelete as $id )
{
    foreach($xpath->query('//div[contains(attribute::id, "'.$id.'")]') as $e ) {
        $e->parentNode->removeChild($e);
    }
}

Please note that you don't need to create a new DOMXPath object for each search: you can create it only once per DOMDocument object.

To show only what you want, you can use this syntax:

$table = $dom->GetElementById( 'MyTable' );
echo $dom->saveHTML( $table );

To have a complete HTML with only desired table, you can create a new DOMDocument and use importNode to add your table:

$src = new DOMDocument();
$dst = new DOMDocument();

$src->loadHTML( $html );
$dst->loadHTML( '<html><head><title>Untitled</title></head><body></body></html>' );

$table    = $src->GetElementById( 'MyTable' );
$imported = $dst->importNode( $table );

$dst->getElementsByTagName( 'body' )->item(0)->appendChild( $imported );

$dst->saveHTML();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM