简体   繁体   English

php Xpath使用innerHTML标签获取innerHTML

[英]php Xpath getting innerHTML with innerHTML tags

I have a HTML file formatted like this:我有一个格式如下的 HTML 文件:

<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>

<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>

I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.我编写了一个 PHP 代码来自动获取每个 p1 并将它们插入到我的 mysql 表中。

this is my code:这是我的代码:

$doc = new DOMDocument();

$doc->loadHTMLFile("file.html");

$xpath = new DomXpath($doc);

$subject = $xpath->query('//p');


for ($i = 0 ; $i < $subject->length-1 ; $i ++) {

if ($subject->item($i)->getAttribute("class") == "p1")
    echo $subject->item($i)->nodeValue;
}
...

This is not my full code, but the problem is:这不是我的完整代码,但问题是:

echo $subject->item($i)->nodeValue;

Which gives me <p>detail important</p> , without the <span></span> tag.这给了我<p>detail important</p> ,没有<span></span>标签。

It is so important to have the span tags around the "important" part of the detail.在细节的“重要”部分周围放置跨度标签非常重要。 is there any function which can do that without getting headache?有没有什么功能可以做到这一点而不会头疼?

Thanks in advance提前致谢

I found the answer to my question :) Thanks to SimpleHTMLDOM我找到了问题的答案 :) 感谢 SimpleHTMLDOM

foreach($html->find('p') as $element) {

 switch ($element->class) {
      case 'p1':
                     $subject = $element;
                     break;
      case 'p2': $detail .= html_entity_decode($element);

 }

} }

the trick is in:诀窍在于:

html_entity_decode($element);

Old query, but there is an one-liner.旧查询,但有一个单行。 The OP should use: OP 应该使用:

$subject = $xpath->query('//p/*');

and then:接着:

echo $doc->saveHtml($subject->item($i));

With the * you'll get the inner html (without the wrapping paragraph tag);使用*您将获得内部 html(没有包装段落标记); without * you'll get the html with the wrapping paragraph;没有 * 你会得到带有包装段落的 html;

Full example:完整示例:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: <p>ciao questa è una <b>prova</b>.</p>输出: <p>ciao questa è una <b>prova</b>.</p>

Whenever I need to parse HTML, I run it through SimpleHTMLDOM:每当我需要解析 HTML 时,我都会通过 SimpleHTMLDOM 运行它:

http://simplehtmldom.sourceforge.net/ http://simplehtmldom.sourceforge.net/

I recommend using version 1.11.我建议使用 1.11 版。 For various reasons, 1.5 is rather broken.由于种种原因,1.5 已经相当破碎了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM