在`\\ w`匹配中匹配一些奇怪的字符

Question

preg_match("/\w+/", $s, $matches);

我有上面的PHP代码。 我用它来匹配字符串中的单词。 除了一个案例外，它的效果很好。

例：

'This is a word'应与{'This','is','a','word'}匹配

'Bös Tüb'应该与{'Bös','Tüb'}匹配

第一个示例有效，但第二个示例无效。 相反，它返回{'B','s','T','b'} ，它不会将ö和ü视为单词字符。

题

如何匹配ö和ü以及通常在名称中使用的任何其他字符（它们可能很奇怪，这是关于德语和土耳其语的名字）？ 我应该手动添加它们（ /[a-zA-Z and all others as unicode]/ ）？

编辑

正如我所说的那样，在单词之间有很多\\n ， \\r和' '字符。 这就是我使用正则表达式的原因。

Answer 1

您可以使用u修饰符来处理Unicode字符。 然后用utf8_decode()解码匹配。

$s = 'Bös Tüb';
preg_match("/\w+/u", $s, $matches); // use the 'u' modifier
var_dump(utf8_decode($matches[0])); // outputs: Bös

Answer 2

如果你需要通过空格分开，你可以使用php explode func，如：

$some_string = 'test some words';
$words_arr = explode(' ', $some_string);
var_dump($words_arr);

无论字符串中的字符是什么，脚本都可以工作。

编辑：您可以尝试：

preg_match("/\w+/u", $s, $matches);

用于unicode。