繁体   English   中英

DOM 解析器突出显示无效的关键字

[英]DOM Parser to highlight keywords not working

这个问题与我之前提出的问题有关,但因为该主题现已结束,我需要进一步提问,我将开始一个新问题,希望没问题。

在我之前的回答中,我充分简化了问题,并得出了简单但不完全有效的解决方案。 这些天我在实现我的代码时意识到了这一点。

上一篇文章中的解决方案的问题是 HTML 标签被替换功能破坏了。 我在本网站的许多帖子中都读到过我需要使用 DOM 解析器。 我对此非常陌生,我尝试了这篇文章中用户“ircmaxell”建议的代码,但它对我不起作用。

这是我所做的示例:

echo '<style type="text/css">
       .ht{
         background-color: yellow;
       }
     </style>'; 


/* taken from user ircmaxell at https://stackoverflow.com/questions/4081372/highlight-keywords-in-a-paragraph

I just modified line $highlight->setAttribute('class', 'highlight') to $highlight->setAttribute('class', 'ht') and commented the first 2 lines   */

function highlight_paragraph($string, $keyword) {
  //$string = '<p>foo<b>bar</b></p>';
  //$keyword = 'foo';
  $dom = new DomDocument();
  $dom->loadHtml($string);
  $xpath = new DomXpath($dom);
  $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
  foreach ($elements as $element) {
   foreach ($element->childNodes as $child) {
     if (!$child instanceof DomText) continue;
     $fragment = $dom->createDocumentFragment();
     $text = $child->textContent;
     $stubs = array();
     while (($pos = stripos($text, $keyword)) !== false) {
       $fragment->appendChild(new DomText(substr($text, 0, $pos)));
       $word = substr($text, $pos, strlen($keyword));
       $highlight = $dom->createElement('span');
       $highlight->appendChild(new DomText($word));
       $highlight->setAttribute('class', 'ht');
       $fragment->appendChild($highlight);
       $text = substr($text, $pos + strlen($keyword));
     }
     if (!empty($text)) $fragment->appendChild(new DomText($text));
     $element->replaceChild($fragment, $child);
   }
 }
 $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
 return $string;
}


$string = '<p>This book has been written against a background of both reckless optimism and reckless despair.</p>
<p>It holds that Progress and Doom are two sides of the same medal; that both are articles of superstition, not of faith. It was written out of the conviction that it should be possible to discover the hidden mechanics by which all traditional elements of our political and spiritual world were dissolved into a conglomeration where everything seems to have lost specific value, and has become unrecognizable for human comprehension, unusable for human purpose.</p>
<p> Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt Brace Jovanovich, Inc., 1973 ed.), p.vii, Preface to the First Edition.</p>';

$keywords = array('This', 'book', 'has', 'been', 'written', 'background', 'reckless', 'optimism', 'despair.', 'holds', 'Progress', 'Doom ', 'two', 'sides', 'medal;', 'articles', 'superstition,', 'faith.', 'lost', 'Arendt,', 'Totalitarianism');

foreach ($keywords as $kw) {
  $string = highlight_paragraph($string, $kw);
}

echo $string;

echo $string 只返回:

This book has been written against a background of both reckless optimism and reckless despair.

并且仅突出显示前两个词“This”和“book”。

通常它应该输出所有带有突出显示关键字的初始字符串。

我在 stackoverflow 和 google 中搜索了很多,但没有找到易于使用的代码来实现我的目的,即使之前有很多人问过同样的问题。

我真的需要这里的帮助。 提前致谢!

你很幸运看到这个问题我无聊。 ;)

您作为答案收到的代码似乎没有经过测试 - 我不知道它怎么可能正常工作。 无论如何,我解决了所有问题并为您提供了一个工作版本 - 在我本地安装的 Apache 服务器上使用 PHP 5.3 进行了测试:

function highlight_paragraph($string, $keyword) {
  $dom = new DOMDocument();
  $dom->loadHtml($string);

  // Search for all text blocks containing the keyword
  $xpath = new DOMXpath($dom);
  $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

  foreach ($textNodes as $textNode) {
    $fragment = $dom->createDocumentFragment();
    $text = $textNode->nodeValue;
    $stubs = array();

    while (($pos = stripos($text, $keyword)) !== false) {
      $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
      $word = substr($text, $pos, strlen($keyword));

      $highlight = $dom->createElement('span');
      $highlight->appendChild(new DOMText($word));
      $highlight->setAttribute('class', 'ht');
      $fragment->appendChild($highlight);

      $text = substr($text, $pos + strlen($keyword));
    }

    if (!empty($text))
      $fragment->appendChild(new DOMText($text));

    $textNode->parentNode->replaceChild($fragment, $textNode);
 }

 return $dom->saveHTML();
}

上面的解决方案不起作用..这是一个非常hacky但可靠的解决方法,以避免突出显示和破坏html。

function highlight_fancy($string, $keywords=array()) {
    $dom = new DOMDocument();
    $dom->loadHtml($string);

    // Search for all text blocks containing the keyword
    $xpath = new DOMXpath($dom);
    foreach($keywords as $keyword){
        $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

        foreach ($textNodes as $textNode) {
            $fragment = $dom->createDocumentFragment();
            $text = $textNode->nodeValue;
            $stubs = array();

            while (($pos = stripos($text, $keyword)) !== false) {
                $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
                $word = substr($text, $pos, strlen($keyword));

                $highlight = $dom->createElement('span');
                $highlight->appendChild(new DOMText($word));
                $highlight->setAttribute('class', 'hl');
                $fragment->appendChild($highlight);

                $text = substr($text, $pos + strlen($keyword));
            }

            if (!empty($text))
                $fragment->appendChild(new DOMText($text));

            $textNode->parentNode->replaceChild($fragment, $textNode);
        }
    }
    $html= $dom->saveHTML();
    $e=explode("<body><p>",$html);
    $e=explode("</p></body>",$e[1]);
    return $e[0];
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM