简体   繁体   English

正则表达式以匹配可能包含中文字符的字符串

[英]Regex to match a string that may contain Chinese characters

I'm trying to write a regular expression which could match a string that possibly includes Chinese characters. 我正在尝试编写一个正则表达式,该表达式可以匹配可能包含汉字的字符串。 Examples: 例子:

hahdj5454_fd.fgg"
example.com/list.php?keyword=关键字
example.com/list.php?keyword=php

I am using this expression: 我正在使用此表达式:

$matchStr =  '/^[a-z 0-9~%.:_\-\/[^x7f-xff]+$/i';
$str      =  "http://example.com/list.php?keyword=关键字";

if ( ! preg_match($matchStr, $str)){
    exit('WRONG');
}else{
    echo "RIGHT"; 
}

It matches plain English strings like that dasdsdsfds or http://example.com/list.php , but it doesn't match strings containing Chinese characters. 它与dasdsdsfdshttp://example.com/list.php类的纯英文字符串匹配,但与包含中文字符的字符串不匹配。 How can I resolve this? 我该如何解决?

Assuming you want to extend the set of letters that this regex matches from ASCII to all Unicode letters, then you can use 假设您要将此正则表达式匹配的字母集从ASCII扩展到所有Unicode字母,则可以使用

$matchStr =  '#^[\pL 0-9~%.:_/-]+$#u';

I've removed the [^x7f-xff part which didn't make any sense (in your regex, it would have matched an opening bracket, a caret, and some ASCII characters that were already covered by the az and 0-9 parts of that character class). 我已经删除了没有任何意义的[^x7f-xff部分(在您的正则表达式中,它会与一个开括号,一个插入符号和一些ASCII字符相匹配,而这些字符已经被az0-9部分覆盖了)该角色类别)。

This works: 这有效:

$str = "http://mysite/list.php?keyword=关键字";

if (preg_match('/[\p{Han}]/simu', $str)) {
    echo "Contains Chinese Characters"; 
}else{
    exit('WRONG'); // Doesn't contains Chinese Characters
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM