简体   繁体   中英

How to find a h3 tag with a certain value

Well, I have a HTML File with the following structure:

<h3>Heading 1</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>
<h3>Heading 2</h3>
  <table>
   <!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
  </table>

I want to get JUST the first table with all its content. So I'll load the HTML File

<?php 
  $dom = new DOMDocument();
  libxml_use_internal_errors(true);
  $dom->loadHTML(file_get_contents('http://www.example.com'));
  libxml_clear_errors();
?>

All tables have the same classes and also have NO specific ID's. That's why the only way I could think of was to grab the h3-tag with the value "Heading 1". I already found this one , which works well for me. (Thinking of the fact that other tables and captions could be added leaves the solution as unfavorable)
How could I grab the h3 tag WITH the value "Heading 1"? + How could I select the following table?

EDIT#1: I don't have access to the HTML File, so I can't edit it.
EDIT#2: My Solution (thanks to Martin Henriksen) for now is:

<?php
    $doc = new DOMDocument(1.0);
    libxml_use_internal_errors(true);
    $doc->loadHTML(file_get_contents('http://example.com'));
    libxml_clear_errors();
    foreach($doc->getElementsByTagName('h3') as $element){
      if($element->nodeValue == 'exampleString')
        $table = $element->nextSibling->nextSibling;
        $innerHTML= '';
        $children = $table->childNodes;
        foreach ($children as $child) {
          $innerHTML .= $child->ownerDocument->saveXML( $child );
        }
        echo $innerHTML;
        file_put_contents("test.xml", $innerHTML);
    }
  ?>

You can Find any tag in HTML using simple_html_dom.php class you can download this file from this link https://sourceforge.net/projects/simplehtmldom/?source=typ_redirect

Than

<?php
include_once('simple_html_dom.php');

$htm  = "**YOUR HTML CODE**";
$html = str_get_html($htm);
$h3_tag = $html->find("<h3>",0)->innertext;
echo "HTML code in h3 tag"; 
print_r($h3_tag);
?>

You can fetch out all the DomElements which the tag h3 , and check what value it holds by accessing the nodeValue . When you found the h3 tag, you can select the next element in the DomTree by nextSibling .

foreach($dom->getElementsByTagName('h3') as $element)
{
    if($element->nodeValue == 'Heading 1')
        $table = $element->nextSibling;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM