简体   繁体   中英

How to output plain text with php DOMDocument?

I'm using this code (thank you Lawrence ) to parse HTML table:

<?php
$html = file_get_contents('http://www.example.com');
$dom = new DOMDocument();
@$dom->loadHTML($html);

//TUE 1 1 4.37 6.39 1.08 5.35 9.18 6.00 1.30 6.30 7.42 9.40                 
echo '
<table>
    <tr>';
foreach($dom->getElementsByTagName('table') as $table) {
    echo innerHTML($table->getElementsByTagName('tr')->item(9));
}
echo '
    </tr>
</table>';

function innerHTML($current){
    $ret = "";
    $nodes = @$current->childNodes;
    if(!empty($nodes)){
        foreach($nodes as $v){
            $tmp = new DOMDocument();
            $tmp->appendChild($tmp->importNode($v, true));
            $ret .= $tmp->saveHTML();
        }
        return $ret;
    }
    return;
}
?>

The problem is that it outputs original HTML code, so how can I output plain text?

I have tried these changes, but it didn't work:

return $ret->textContent;
return $ret->nodeValue;
return $ret->plaintext;

echo innerHTML($table->getElementsByTagName('tr')->item(9)->textContent);
echo innerHTML($table->getElementsByTagName('tr')->item(9)->nodeValue);
echo innerHTML($table->getElementsByTagName('tr')->item(9)->plaintext);

The solution is actually very simple - strip_tags function.

echo strip_tags(innerHTML($table->getElementsByTagName('tr')->item(9)));

It takes the value and removes all of the HTML code, which results in plain text value.

html2text library will convert your html content to text, It uses PHP's DOM methods, and iterate all the elements and extract text from given HTML

Usage:

$text = convert_html_to_text($html);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM