简体   繁体   中英

Remove emojis / unicode chars

My website and database is set to utf-8 and utf8mb4.

On textareas it's perfectly fine when users put utf-8 symbols/emojis.

But on certain input fields (name, address etc.) I want to remove the possibility of those "funny symbols", and only deal with basic text and numbers, including danish characters æøå, accents and symbols like -_'@()?=,.:;!"#&<> etc.

How would I go about this?

Is there some native php function to strip unicode symbols/characters, or do I have to find/make a specific regex function for it?

There are functions for checking encoding: http://php.net/manual/en/function.mb-check-encoding.php but to strip out characters I think you would need to use regex:

function StripNonUTF($str){
  return preg_replace('/[^\pL\pM[:ascii:]]+/g', '', $str);
}
  • \\pL matches any kind of letter from any language
  • \\pM matches a character intended to be combined with another character (eg accents, umlauts, enclosing boxes, etc.)
  • [:ascii:] matches a character with ASCII value 0 through 127

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM