简体   繁体   中英

Parsing HTML with Php

I cant get the data between the tags into the arrays:

// Load the HTML string from file and create a SimpleXMLElement
$html_string = file_get_contents("data/csr.html"); /*the string really is in $html_string*/
$root = new SimpleXMLElement($html_string);

Problem starts here when I try to get that the value between the tags: div, h2 and span into an array

// Fetch all div, h2 and span values
$divArray = $hdlsArray = $dtlsArray = array();
    foreach ($root->div as $div) {
    $divArray[] = $div;
    echo "".$div."<br />";
}
foreach ($root->h2 as $h2) {
    $hdlsArray[] = $h2;
    echo "".$h2."<br />";
}
foreach ($root->span as $span) {
    $dtlsArray[] = $span;
    echo "".$span."<br />";
}

The result of this is a blank page instead of printing the actual tag data

As an alternate to SimpleXMLElement, I suggest Simple HTML DOM ( online manual ). I've used it before and very much satisfied with the results. It allows you to use jQuery like selectors so fetching all div, h2 and span values is fairly simple.

This page says (about SimpleXML) "the only problem with it is that it'll only load valid XML" but may provide a workaround for HTML.

The 'Related Questions' on StackOverflow include this one , but it describes HTML inside valid XML tags.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM