简体   繁体   中英

How to highlight non utf8 string in utf8 text in PHP?

I cannot find solution to highlight matches in PHP with ignoring UTF8 symbols.

Code example:

$text = "Lorem Ipsum – tas ir teksta salikums, kuru izmanto poligrāfijā un maketēšanas darbos. Lorem Ipsum ir kļuvis par vispārpieņemtu teksta aizvietotāju kopš 16. gadsimta sākuma. Tajā laikā kāds nezināms iespiedējs izveidoja teksta fragmentu, lai nodrukātu grāmatu ar burtu paraugiem.";
$keywordsNotWorking = ["poligrafija", "kops"];
$keywordsWorking = ["poligrāfijā", "kopš"];

function highlightFoundText($text, $keywords, $tag = "b")
{
  foreach ($keyword as $key){
    $text = preg_replace("/\p{L}*?".preg_quote($key)."\p{L}*/ui", "<".$tag.">$0</".$tag.">", $text);
  }
  return $text;
}

If I use $keywordsWorking , then all is ok, but when using $keywordsNotWorking , then no matching results found. Please help me to find solution how can I highlight matches with ignoring UTF8 symbols.

Finally, I made working solution. Post answer, if somebody will goes to the same issue.

class Highlighter
{
    private $_text;
    private $_keywords;

    private $keywords;
    private $text;

    private $tag = "b";

    public function highlight($text, $keywords)
    {
        $this->text = $text;
        $this->keywords = (array) $keywords;

        if(count($keywords) > 0)
        {
            $this->prepareString();
            $this->highlightStrings();
        }

        return $this->text;
    }

    private function unicodeSymbols()
    {
        return [
            'ā' => 'a',
            'č' => 'c',
            'ē' => 'e',
            'ī' => 'i',
            'ķ' => 'k',
            'ļ' => 'l',
            'ņ' => 'n',
            'š' => 's',
            'ū' => 'u',
            'ž' => 'z'
        ];
    }

    private function clearVars()
    {
        $this->_text = null;
        $this->_keywords = [];
    }

    private function prepareString()
    {
        $this->clearVars();

        $this->_text = strtolower( strtr($this->text, $this->unicodeSymbols()) );

        foreach ($this->keywords as $keyword)
        {
            $this->_keywords[] = strtolower( strtr($keyword, $this->unicodeSymbols()) );
        }
    }

    private function highlightStrings()
    {
        foreach ($this->_keywords as $keyword)
        {

            if(strlen($keyword) === 0) continue;

            // find cleared keyword in cleared text.
            $pos = strpos($this->_text, $keyword);

            if($pos !== false)
            {

                $keywordLength = strlen($keyword);

                // find original keyword.
                $originalKeyword = mb_substr($this->text, $pos, $keywordLength);

                // highlight in both texts.
                $this->text = str_replace($originalKeyword, "<{$this->tag}>".$originalKeyword."</{$this->tag}>", $this->text);
                $this->_text = str_replace($keyword, "<{$this->tag}>".$keyword."</{$this->tag}>", $this->_text);
            }

        }
    }

    public function setTag($tag)
    {
        $this->tag = $tag;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM