I am trying to handle the string which got letters, numbers, Chinese and some punctuations and left number, letters and Chinese only like below
raw string
a>b%%c##1@23测$$试??\\:.##,,??!!
result
abc123测试
for Chinese, preg_replace('/\P{Han}+/u', '', $text)
works perfectly
测试 // result
and for number and letters, preg_replace('/[^0-9a-zA-Z]/', '', $text)
works too.
abc123 // result
But how can I combine them?
why preg_replace('/[\P{Han}]|[0-9a-zA-Z]/u', '', $raw);
doesn't work as expected?
Thanks a lot for anyone help!
You need
preg_replace('~[^0-9a-zA-Z\p{Han}]+~u', '', $raw)
See the regex demo .
The [^0-9a-zA-Z\p{Han}]+
is a negated character class that matches any one or more chars other than ASCII digits, ASCII letters and any Chinese chars.
It is important to use the u
flag with this pattern, as your input is Unicode strings.
See the PHP demo :
$raw = 'a>b%%c##1@23测$$试??\\:.##,,??!!';
echo preg_replace('~[^0-9a-zA-Z\p{Han}]+~u', '', $raw);
// => abc123测试
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.