简体   繁体   中英

Wrap a H3 tag and all UL tags under it in a div

I have a structure that goes like this:

<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>

I extracted this from a web-page using DOMDocument . I need to iterate through 9000 pages which all have slight variations in them. So the "Third Heading" might in fact be a table in some instances instead of another h3.

What I am trying to do accurately is wrap a div around the second heading and closing the div when it finds no more </ul> tags (so until it hits anything that's not a ul tag ). So the result would be something like this:

<h3><span class="header" id="first_set">My Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<div class="second_heading">
<h3><span class="header" id="second_set">My Second Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
</div>
<h3><span class="header" id="third_set">My Third Heading</span></h3>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>
<ul><li>Text Text Text</li></ul>

I'm thinking preg_replace but not sure how to do the logic of "close div when last closing ul tag is found".

You can achieve this while still working with your DOMDocument . I'm assuming you have a variable called $node which is the node above the HTML you show in your question. In that case, you can find all the child nodes of that element using DOMXPath , then iterate through them until you get to the second <h3> and append that and all subsequent <ul> elements to a new <div> until you get to the first non <ul> element after the second header:

$div = $doc->createElement('div');
$xpath = new DOMXPath($doc);
$headers = 0;
foreach ($xpath->query('./*', $node) as $child) {
    echo $child->nodeName;
    switch ($child->nodeName) {
        case 'h3':
            $headers++;
            if ($headers == 2) {
                $node->replaceChild($div, $child);
                $div->appendChild($child);
            }
            else if ($headers == 3) {
                break 2;
            }
            break;
        case 'ul':
            if ($headers == 2) $div->appendChild($child);
            break;
        default:
            // if a non-ul element after the 2nd header, exit the loop
            if ($headers == 2) break 2;
            break;
    }
}

Demo on 3v4l.org

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM