I'm trying to split a string with text into words by using the php-function preg_split.
$words = preg_split('/\W/u',$text);
It works fine except for swedish chars lite åäö. Doing utf8_encode or decode doesn't help either. My guess is that preg_split only works with single byte chars and that the swedish chars are multibyte. Is there another way to do it?
Why are you paying any attention to specific characters?
$text = "Jag har hört så mycket om dig.";
$words = explode(" ", $text);
/*
Array
(
[0] => Jag
[1] => har
[2] => hört
[3] => så
[4] => mycket
[5] => om
[6] => dig.
)
*/
mb_split
to the rescue (had problems myself with these some time ago, just now found the answer :)
mb_regex_encoding('UTF-8');
mb_split('\W', $text);
HTH
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.