简体   繁体   中英

How to match non-printable characters?

I have a file with a different encoding than the machine has. When using regex, . does not match non-printable characters for the current character set.

The following prints 0:

echo -e "\xfc" | awk '{ print match( $0, "^.*$" ) }'

How I can match all chars including non-printable chars?

I can confirm that it doesn't work with de_DE.UTF-8 locale, but both de_DE.iso88591 and C print a 1 . I can't tell you why, but the [:alpha:] character class matches:

echo -e "\xfc" | awk '{ print match( $0, "^([[:alpha:]]|.)*$" ) }'

Or maybe you could change the locale settings for that awk call:

OLDLANG=$LANG; export LANG=de_DE.iso88591; echo -e "\xfc" | awk '{ print match( $0, "^.*$" ) }'; export LANG=$OLDLANG

See also Using special characters in a string argument to the awk match function. Current locale settings .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM