简体   繁体   中英

Regex to strip non utf-8 characters but new line

I have a string which contains a new line feed and some non-utf8 characters. I'm trying to write some regex that will replace non utf-8 characters but it should keep the line endings.

Below is what I have from PHP

PHP preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

It's stripping the non utf-8 characters but it's also stripping the new line endings and I can't find out how to do this.

I've tried /[\\x00-\\x1F\\x80-\\xFF\\^\\n]/ but hasn't worked.

Add a negative lookahead at the start. Now this won't match newline character.

preg_replace('/(?!\n)[\x00-\x1F\x80-\xFF]/', '', $string);

or

preg_replace('/(?![\n\r])[\x00-\x1F\x80-\xFF]/', '', $string);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM