简体   繁体   中英

remove in php any character but not symbols and letters

how I can use str_ireplace or other functions to remove any characters but not letters,numbers or symbols that are commonly used in HTML as : " ' ; : . - + = ... etc. I also wants to remove /n, white spaces, tabs and other.

I need that text, comes from doing ("textContent"). innerHTML in IE10 and Chrome, which a php variable are the same size, regardless of which browser do it.Therefore I need the same encoding in both texts and characters that are rare or different are removed.

I try this, but it dont work for me:

        $textForMatch=iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
        $textoForMatc = str_replace(array('\s', "\n", "\t", "\r"), '', $textoForMatch);

$text contains the result of the function ("textContent"). innerHTML, I want to delete characters as é³..

The easiest option is to simply use preg_replace with a whitelist. Ie use a pattern listing the things you want to keep, and replace anything not in that list:

$input = 'The quick brown 123 fox said "�é³". Man was I surprised';
$stripped = preg_replace('/[^-\w:";:+=\.\']/', '', $input);
$output = 'Thequickbrownfoxsaid"".ManwasIsurprised';

regex explanation

/       - start regex
[^      - Begin inverted character class, match NON-matching characters
-       - litteral character
\w      - Match word characters. Equivalent to A-Za-z0-9_
:";:+=  - litteral characters
\.      - escaped period (because a dot has meaning in a regex)
\'      - escaped quote (because the string is in single quotes)
]       - end character class
/       - end of regex

This will therefore remove anything that isn't words, numbers or the specific characters listed in the regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM