简体   繁体   中英

Regex to reject non-english characters?

Is there a simple regex that will catch all non-english characters? It would need to allow common punctation and symbols, but no special characters such as Russian, Japanese, etc.

Looking for something to work in PHP.

Since in your comment your referring to addresses, they might contain digits too. So:

preg_replace('/[^[:alpha:][:punct:][:digit:]]/u', utf8_encode($input), '');

Should replace your unwanted characters. The [:alpha:] class will only work, if your locale is set up correctly, though. If, for example, it's set to de_DE , not only "a" through "z" are regarded characters, but also "exotics" like "ä", "ö", "è", and the like.

Also, since you don't want "Russian, Japanese, etc.", note the u modifier. The input has to be UTF-8 encoded in order to not break it and give you wrong results.

这样的[^ A-Za-z0-9 \\,\\。\\-]吗?

这个问题似乎可以解决: PHP验证字符串字符为英国或美国键盘字符

use hex codes , eg this cleans out all non-ascii characters as well as line endings, and replaces them with spaces. space ( \\x20 ) is deliberately left out of the range so that consecutive runs of spaces and/or special chars are replaced with a single space.

$clean = trim(preg_replace('/[^\x21-\x7E]+/', ' ', $input));
if (strlen($str) == strlen(utf8_decode($str))) {

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM