Parsing HTML with formatted text

Question

I'm parsing an HTML web page with DOMDocument.

Here is my code:

$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$input = file_get_contents($url); //Url passato come parametro
$doc->loadHTML( $input );
$xpath = new DOMXpath($doc);
$article = $xpath->query('//div[@class="entry-container fix"]');

In $article I have all text inside a "entry-container fix" Div.

But this text in the web page have a formatted text. Simply expample:

<div> 
   <p> Text <strong> Strong text </strong> </p>
</div>

With my code, I lost all bold, italian characters, all paragraphs ecc... There's a way to get all formatted text?

Answer 1

Why not use the saveHTML function to extract that HTML (here is the link : http://php.net/manual/fr/domdocument.savehtml.php ). It would look something like this :

$sFormated = $doc->saveHTML($article->item(0));

Parsing HTML with formatted text

Question

1 answers

solution1
1 ACCPTED 2016-03-10 17:39:43

Parsing HTML with formatted text

Question

1 answers

solution1 1 ACCPTED 2016-03-10 17:39:43

solution1
1 ACCPTED 2016-03-10 17:39:43