使用格式化文本解析HTML

Question

I'm parsing an HTML web page with DOMDocument. 我正在使用DOMDocument解析HTML网页。

Here is my code: 这是我的代码：

$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$input = file_get_contents($url); //Url passato come parametro
$doc->loadHTML( $input );
$xpath = new DOMXpath($doc);
$article = $xpath->query('//div[@class="entry-container fix"]');

In $article I have all text inside a "entry-container fix" Div. 在$ article中我将所有文本都放在“入口容器修复”Div中。

But this text in the web page have a formatted text. 但是网页中的这个文本有一个格式化的文本。 Simply expample: 简单说明：

<div> 
   <p> Text <strong> Strong text </strong> </p>
</div>

With my code, I lost all bold, italian characters, all paragraphs ecc... There's a way to get all formatted text? 使用我的代码，我丢失了所有大胆的意大利字符，所有段落都是ecc ...有一种方法可以获得所有格式化的文本吗？

Answer 1

Why not use the saveHTML function to extract that HTML (here is the link : http://php.net/manual/fr/domdocument.savehtml.php ). 为什么不使用saveHTML函数来提取HTML（这里是链接： http ：//php.net/manual/fr/domdocument.savehtml.php）。 It would look something like this : 它看起来像这样：

$sFormated = $doc->saveHTML($article->item(0));

使用格式化文本解析HTML

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-03-10 17:39:43

使用格式化文本解析HTML

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-03-10 17:39:43

解决方案1
1 已采纳 2016-03-10 17:39:43