简体   繁体   English

PHP从字符串中提取包含特殊字符的单词

[英]PHP extract word that contains special character from a string

I have a string : 我有一个字符串:

$str = " Côte-d'azure ! (3000) limousin - limousine  ";

And I need to extract some words and put them in an array. 我需要提取一些单词并将其放入数组中。 to get finally : 最终得到:

array (
        0 => "Côte-d'azure",
        1 => "limousin",
        2 => "limousine"
     );

So I tried : 所以我尝试了:

preg_match_all("/[a-zA-Z]+/", $str, $all);

but this ignore the special character ô , ' and - 但这忽略了特殊字符ô'-

please any advise ? 请任何建议?

Use Unicode mode u and character properties : 使用Unicode模式u字符属性

preg_match_all('/\p{L}[\p{L}\\\\\'-]+/u', mysql_real_escape_string($str), $all);

This requires one (Unicode) letter and then matches as many other Unicode letters, backslashes, hyphens and apostrophes as possible. 这需要一个(Unicode)字母,然后匹配尽可能多的其他Unicode字母,反斜杠,连字符和撇号。 If you want other punctuation characters to not separate a word, include it in the character class. 如果希望其他标点符号不分隔单词,请将其包括在字符类中。

Note that 5 backslashes. 请注意5个反斜杠。 Three backslashes are removed when the string is compiled, because two of them escape the backslash following them, and the last one escapes the ' . 编译字符串时,将删除三个反斜杠,因为其中两个反斜杠转义了后面的反斜杠,而最后一个反斜杠转义了' So the regex engine receives only 2 backslashes. 因此,正则表达式引擎仅接收2个反斜杠。 These are interpreted by the regex engine as one literal backslash. 这些由正则表达式引擎解释为一个文字反斜杠。 Unfortunately there is no way to use less than 4 backslashes to represent one literal backslash when using PHP. 不幸的是,在使用PHP时,无法使用少于4个反斜杠来表示一个文字反斜杠。

try 尝试

if (preg_match('/[^a-zA-Z0-9]+/', $your_string, $matches))
{
  echo '  symbol encountered !!';
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM