unix - print distinct list of control characters in a file

Question

For example given an input file like below:

sid|storeNo|latitude|longitude
2|1|-28.03õ720000
9|2
10
jgn
352|1|-28.03¿720000
9|2|fd¿kjhn422-405
000¥0543210|gf¿djk39
gfd|f¥d||fd

Output (the characters below can appear in any order):

¿õ¥

Does anyone have a function (awk, bash, perl.etc) that could scan each line and then output (in octal, hex or ascii - either is fine) a distinct list of the control characters (for simplicity, control characters being those above ascii char 126) found?

Using perl v5.8.8.

Answer 1

To print the bytes in octal:

perl -ne'printf "%03o\n", ord for /[^\x09\x0A\x20-\x7E]/g' file  | sort -u

To print the bytes in hex:

perl -ne'printf "%02X\n", ord for /[^\x09\x0A\x20-\x7E]/g' file  | sort -u

To print the original bytes:

perl -nE'say for /[^\x09\x0A\x20-\x7E]/g' file  | sort -u

Answer 2

This should catch everything over ordinal value 126 without having to explicitly weed out outliers

#!/bin/bash

while IFS= read -n1 c; do 
  if (( $(printf "%d" "'$c") > 126)); then
    echo "$c"
  fi
done < ./infile | sort -u

Output

¥
¿
õ

Answer 3

To delete everything except the control characters:

tr -d '\0-\176' < input > output

To test:

printf 'foobar\n\377' | tr -d '\0-\176' | od -t c

See tr(1) man page for details.

Answer 4

sed -e 's/[A-Za-z0-9,|]//g' -e 's/-//g' -e 's/./&^M/g' | sort -u

Delete everything you don't want, put everything else on its own line, then sort -u the whole kit.

The "&^M" is "&" followed by Ctrl-V followed by Ctrl-M in Bash.

Unix wins.

unix - print distinct list of control characters in a file

Question

4 answers

solution1
2 ACCPTED 2011-12-31 06:53:49

solution2
2 2011-12-31 06:57:17

Output

solution3
2

solution4
0 2011-12-31 06:50:40

unix - print distinct list of control characters in a file

Question

4 answers

solution1 2 ACCPTED 2011-12-31 06:53:49

solution2 2 2011-12-31 06:57:17

Output

solution3 2

solution4 0 2011-12-31 06:50:40

solution1
2 ACCPTED 2011-12-31 06:53:49

solution2
2 2011-12-31 06:57:17

solution3
2

solution4
0 2011-12-31 06:50:40