I am trying to remove non-printable character (for eg ^@
) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time. I tried using
sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
but still the ^@
characters are not removed. Also I tried using
awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
Can anybody suggest some alternative way to remove non-printable characters?
Used tr -cd
but it is removing accented characters. But they are required in the file.
Perhaps you could go with the complement of [:print:]
, which contains all printable characters:
tr -cd '[:print:]' < file > newfile
If your version of tr
doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
sed 's/[^[:print:]]//g' file
Remove all control characters first:
tr -dc '\007-\011\012-\015\040-\376' < file > newfile
Then try your string:
sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' newfile
I believe that what you see ^@
is in fact a zero value \\0
.
The tr
filter from above will remove those as well.
strings -1 file... > outputfile
seems to work. The strings program will take all printable characters, in this case of length 1 (the -1 argument) and print them. It effectively is removing all the non-printable characters.
"man strings" will provide the documentation.
Was searching for this for a while & found a rather simple solution:
The package ansifilter
does exactly this. All you need to do is just pipe the output through it.
On Mac:
brew install ansifilter
Then:
cat file.txt | ansifilter
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.