I'm attempting to use PHP DOM with help parsing an HTML file that I want to translate into JSON. However, unfortunately the HTML DOM is fairly flat (and I have no way to change that). By flat I mean the structure is something like this:
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>
I need to be able to get the <h2>
's and treat the <span>
's as children. I'm not completely set on using PHP DOM if there's a better alternative, it's simply what I found in an answer I came across , so please feel free to suggest anything. What I really need is to serve this HTML string into JSON, and PHP DOM looks like my best bet thus far.
$XML =<<<XML
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title </h2>
<span>child node</span>
<span>another child</span>
XML;
$dom = new DOMDocument;
$dom->loadHTML($XML);
$xp = new DOMXPath($dom);
$new = new DOMDocument;
$root = $new->createElement('root');
foreach($xp->query('/html//*/node()') as $i => $node) {
if ($node->nodeType == XML_TEXT_NODE)
continue;
if ($node->nodeName == 'h2') {
if(isset($current))
$root->appendChild($current);
$current = $new->createElement('div');
$current->appendChild($new->importNode($node, true));
continue;
}
$current->appendChild($new->importNode($node, true));
}
$new->appendChild($root);
$xml2 = simplexml_load_string($new->saveHTML());
echo json_encode($xml2);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.