简体   繁体   中英

Splitting string into words with swedish chars

I'm trying to split a string with text into words by using the php-function preg_split.

$words = preg_split('/\W/u',$text);

It works fine except for swedish chars lite åäö. Doing utf8_encode or decode doesn't help either. My guess is that preg_split only works with single byte chars and that the swedish chars are multibyte. Is there another way to do it?

Why are you paying any attention to specific characters?

$text = "Jag har hört så mycket om dig.";
$words = explode(" ", $text);
/*
Array
(
    [0] => Jag
    [1] => har
    [2] => hört
    [3] => så
    [4] => mycket
    [5] => om
    [6] => dig.
)
*/

mb_split to the rescue (had problems myself with these some time ago, just now found the answer :)

mb_regex_encoding('UTF-8');
mb_split('\W', $text);

HTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM