PHP 在两者之间抓取 HTML<pre> 标签

Question

我无法找出如何仅从内部抓取 HTML 内容

and

带有 PHP5 的标签。

我想以下面的文档为例，取2个（或更多的预标记区域，它的动态）并将其推入一个数组。

 blablabla <pre>save this really</pre> not this <pre>save this too really </pre> but not this

我如何将另一台服务器上的 html 文件的 pre 标记之间的区域推入数组。

Answer 1

我建议使用 xpath

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXpath($doc);

$pre_tags = array();
foreach($xpath->query('//pre') as $node){
    $pre_tags[] = $node->nodeValue;
}

Answer 2

假设 HTML 格式正确，您可以执行以下操作：

$pos = 0;
$insideTheDiv = array();
while (($pos = strpos($theHtml, "<pre>", $pos)) !== false) {
    $pos += 5;
    $endPrePos = strpos($theHtml, "</pre>", $pos);
    if ($endPrePos !== false) {
        $insideTheDiv[] = substr($theHtml, $pos, $endPrePos - $pos);
    } else break;
}

完成后， $insideTheDiv应该是pre标签的所有内容的数组。

演示： http : //codepad.viper-7.com/X15l7P （它从输出中去除换行符）

Answer 3

您可以简单地使用正则表达式来提取 pre 标签中的所有内容。

在 python 中，这将是：

re.compile('<pre>(.*?)</pre>', re.DOTALL).findall(html)

PHP 在两者之间抓取 HTML<pre> 标签

问题描述

3 个解决方案

解决方案1
1 已采纳 2011-11-09 05:08:33

解决方案2
0 2011-11-09 03:31:52

解决方案3
0 2011-11-14 16:32:44

PHP 在两者之间抓取 HTML<pre> 标签

问题描述

3 个解决方案

解决方案1 1 已采纳 2011-11-09 05:08:33

解决方案2 0 2011-11-09 03:31:52

解决方案3 0 2011-11-14 16:32:44

解决方案1
1 已采纳 2011-11-09 05:08:33

解决方案2
0 2011-11-09 03:31:52

解决方案3
0 2011-11-14 16:32:44