简体   繁体   中英

International case insensitive search and replace

I have a PHP page that displays search results. Searches are case insensitive. For example, the user might search for the word "FÖR" in any case. One corresponding match is the text "öga för öga". When a match is found, I want to display the match but colour it differently using CSS. My current solution has the line preg_match_all("/$needle+/i", $haystack, $matches); which performs a case insensitive regular expression search. It works fine for latin character but fails for international characters. In particular, I'm using Swedish, Greek and Hebrew characters. How can I do this? Example code here:

private function highlightStr($needle, $haystack) {
     // return $haystack if there are no strings given, nothing to do.
    if (strlen($haystack) < 1 || strlen($needle) < 1) {
        return $haystack;
    }
    preg_match_all("/$needle+/i", $haystack, $matches);
    if (is_array($matches[0]) && count($matches[0]) >= 1) {
        foreach ($matches[0] as $match) {
            $haystack = str_replace($match, '<span class="searchHighlight">'.$match.'</span>', $haystack);
        }
    }
    return $haystack;
}

Example search: FÖR

Example match: öga för öga

Desired result: öga <span class="searchHighlight">för</span> öga

Edit : I got it working when I changed the code to

preg_match_all("/\b{$needle}\b/ui", $haystack, $matches);
if (is_array($matches[0]) && count($matches[0]) >= 1) {
    $unique = array_count_values($matches[0]);
    foreach ($unique as $match => $value) {
        $haystack = preg_replace("/\b{$match}\b/ui", '<span class="searchHighlight">'.$match.'</span>', $haystack);
    }
}

/ui inside the argument to preg_replace specifies to use U nicode and be case I nsensitive. ( \\b specifies a word boundary)

To do a search and replace in php you can use:

$result = preg_replace('/FÖR/ui', '<span class="searchHighlight">$0</span>', $text);

To match a Unicode letter with regex in php you can use: \\p{L}
Is the text to be matched by the regex html-escaped like in f&ouml;r ?
If yes you would have to unescape it before \\p{L} can match it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM