Trying to remove non-printable characters (junk values) from a UNIX file

Question

I am trying to remove non-printable character (for eg ^@ ) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using

sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME

but still the ^@ characters are not removed. Also I tried using

awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE

but it also did not help.

Can anybody suggest some alternative way to remove non-printable characters?

Used tr -cd but it is removing accented characters. But they are required in the file.

Answer 1

Perhaps you could go with the complement of [:print:] , which contains all printable characters:

tr -cd '[:print:]' < file > newfile

If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):

sed 's/[^[:print:]]//g' file

Answer 2

Remove all control characters first:

tr -dc '\007-\011\012-\015\040-\376' < file > newfile

Then try your string:

sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' newfile

I believe that what you see ^@ is in fact a zero value \\0 .
The tr filter from above will remove those as well.

Answer 3

strings -1 file... > outputfile

seems to work. The strings program will take all printable characters, in this case of length 1 (the -1 argument) and print them. It effectively is removing all the non-printable characters.

"man strings" will provide the documentation.

Answer 4

Was searching for this for a while & found a rather simple solution:

The package ansifilter does exactly this. All you need to do is just pipe the output through it.

On Mac:

brew install ansifilter

Then:

cat file.txt | ansifilter

Trying to remove non-printable characters (junk values) from a UNIX file

Question

4 answers

solution1
14 ACCPTED 2015-12-22 09:48:15

solution2
3

solution3
0 2019-11-05 22:38:09

solution4
0 2021-11-02 18:26:07

Trying to remove non-printable characters (junk values) from a UNIX file

Question

4 answers

solution1 14 ACCPTED 2015-12-22 09:48:15

solution2 3

solution3 0 2019-11-05 22:38:09

solution4 0 2021-11-02 18:26:07

solution1
14 ACCPTED 2015-12-22 09:48:15

solution2
3

solution3
0 2019-11-05 22:38:09

solution4
0 2021-11-02 18:26:07