简体   繁体   中英

DOM Parser to highlight keywords not working

This question is related with one I have made before but because the topic is now closed and I need to ask something further I will start a new question by hoping that's fine.

In my previous answer I simplified the problem enough and resulted in simple but not fully working solutions. I realized it these days when I was implementing my code.

The problem with the solutions in the previous post is that the HTML tags are broken by the replacing functions. I have read in many posts of this site that I need to use a DOM Parser. I am very unfamiliar with this and I tried the code suggested by the user “ircmaxell” in this post , but it does not work for me.

Here is sample of what I did:

echo '<style type="text/css">
       .ht{
         background-color: yellow;
       }
     </style>'; 


/* taken from user ircmaxell at https://stackoverflow.com/questions/4081372/highlight-keywords-in-a-paragraph

I just modified line $highlight->setAttribute('class', 'highlight') to $highlight->setAttribute('class', 'ht') and commented the first 2 lines   */

function highlight_paragraph($string, $keyword) {
  //$string = '<p>foo<b>bar</b></p>';
  //$keyword = 'foo';
  $dom = new DomDocument();
  $dom->loadHtml($string);
  $xpath = new DomXpath($dom);
  $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
  foreach ($elements as $element) {
   foreach ($element->childNodes as $child) {
     if (!$child instanceof DomText) continue;
     $fragment = $dom->createDocumentFragment();
     $text = $child->textContent;
     $stubs = array();
     while (($pos = stripos($text, $keyword)) !== false) {
       $fragment->appendChild(new DomText(substr($text, 0, $pos)));
       $word = substr($text, $pos, strlen($keyword));
       $highlight = $dom->createElement('span');
       $highlight->appendChild(new DomText($word));
       $highlight->setAttribute('class', 'ht');
       $fragment->appendChild($highlight);
       $text = substr($text, $pos + strlen($keyword));
     }
     if (!empty($text)) $fragment->appendChild(new DomText($text));
     $element->replaceChild($fragment, $child);
   }
 }
 $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
 return $string;
}


$string = '<p>This book has been written against a background of both reckless optimism and reckless despair.</p>
<p>It holds that Progress and Doom are two sides of the same medal; that both are articles of superstition, not of faith. It was written out of the conviction that it should be possible to discover the hidden mechanics by which all traditional elements of our political and spiritual world were dissolved into a conglomeration where everything seems to have lost specific value, and has become unrecognizable for human comprehension, unusable for human purpose.</p>
<p> Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt Brace Jovanovich, Inc., 1973 ed.), p.vii, Preface to the First Edition.</p>';

$keywords = array('This', 'book', 'has', 'been', 'written', 'background', 'reckless', 'optimism', 'despair.', 'holds', 'Progress', 'Doom ', 'two', 'sides', 'medal;', 'articles', 'superstition,', 'faith.', 'lost', 'Arendt,', 'Totalitarianism');

foreach ($keywords as $kw) {
  $string = highlight_paragraph($string, $kw);
}

echo $string;

echo $string only returns:

This book has been written against a background of both reckless optimism and reckless despair.

And only the first two words, 'This' and 'book' are highlighted.

Normally it should have outputted all the initial string with the keywords highlighted.

I have searched a lot in stackoverflow and google and did not find an easy to use code to achieve my purpose even if there are lots of people that have asked the same thing before.

I really need a help over here. Thanks in advance!

You are lucky that I was very bored when I saw this question. ;)

The code you received as an answer didn't seem to have been tested - I don't know how it could have possibly worked correctly. Anyway, I fixed all the problems and present you a working version - tested on my locally installed Apache Server with PHP 5.3:

function highlight_paragraph($string, $keyword) {
  $dom = new DOMDocument();
  $dom->loadHtml($string);

  // Search for all text blocks containing the keyword
  $xpath = new DOMXpath($dom);
  $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

  foreach ($textNodes as $textNode) {
    $fragment = $dom->createDocumentFragment();
    $text = $textNode->nodeValue;
    $stubs = array();

    while (($pos = stripos($text, $keyword)) !== false) {
      $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
      $word = substr($text, $pos, strlen($keyword));

      $highlight = $dom->createElement('span');
      $highlight->appendChild(new DOMText($word));
      $highlight->setAttribute('class', 'ht');
      $fragment->appendChild($highlight);

      $text = substr($text, $pos + strlen($keyword));
    }

    if (!empty($text))
      $fragment->appendChild(new DOMText($text));

    $textNode->parentNode->replaceChild($fragment, $textNode);
 }

 return $dom->saveHTML();
}

above solution didn't work.. here's a really hacky but solid workaround to avoid highlighting and breaking html.

function highlight_fancy($string, $keywords=array()) {
    $dom = new DOMDocument();
    $dom->loadHtml($string);

    // Search for all text blocks containing the keyword
    $xpath = new DOMXpath($dom);
    foreach($keywords as $keyword){
        $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

        foreach ($textNodes as $textNode) {
            $fragment = $dom->createDocumentFragment();
            $text = $textNode->nodeValue;
            $stubs = array();

            while (($pos = stripos($text, $keyword)) !== false) {
                $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
                $word = substr($text, $pos, strlen($keyword));

                $highlight = $dom->createElement('span');
                $highlight->appendChild(new DOMText($word));
                $highlight->setAttribute('class', 'hl');
                $fragment->appendChild($highlight);

                $text = substr($text, $pos + strlen($keyword));
            }

            if (!empty($text))
                $fragment->appendChild(new DOMText($text));

            $textNode->parentNode->replaceChild($fragment, $textNode);
        }
    }
    $html= $dom->saveHTML();
    $e=explode("<body><p>",$html);
    $e=explode("</p></body>",$e[1]);
    return $e[0];
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM