I'm trying to use the DomDocument class to load and analyse an HTML fragment (doesn't include the <html>
and <body>
tags). There is a lot of garbage left over from MS-Word when it was converted into HTML, so I'm getting warning messages such as DOMDocument::loadHTML(): Tag o:p invalid in Entity, line: 69 ddtest.d8.drush.inc:68
. Here is the relevant code:
$dom = new DOMDocument;
//load the html into the object
$dom->loadHTML($row->body_value);
I've tried to get rid of the warning messages by using this:
$dom = new DOMDocument;
//load the html into the object
$dom->loadHTML($row->body_value, LIBXML_NOWARNING);
But it has no effect, the warning messages are still displayed. What am I doing wrong?
You could try using the error handling of libxml
like this perhaps:
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=true;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( $row->body_value );
libxml_clear_errors();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.