How to not display non-printable caracters with wget output?

Question

I'm making a bash script to retrieve some html content, the command line is:

wget http://some_url.com -q -O -output.txt -o /dev/null

But when there are images in the page wget still "display" those with non-printable caracters.

Is there a way to tell to wget to not display those non-printable caracters ?

Cheers

ps: as a matter of fact, i can't do any grep on the "output.txt" as it is considered as a binary file (because of the non printable caracters)

You can try with thsi url for instance: https://www.offensive-security.com/pwbonline/icq.html

Answer 1

Usually HTML documents won't contain binary data. I can't reproduce this specific problem.

If it is just about to force grep to search in files which would usually being skipped because grep assumes they are binary, use --binary-files=text :

wget -O- http://server.com/url | grep --binary-files=text 'foo.*bar'

How to not display non-printable caracters with wget output?

Question

1 answers

solution1
0 2016-04-19 08:51:09

How to not display non-printable caracters with wget output?

Question

1 answers

solution1 0 2016-04-19 08:51:09

solution1
0 2016-04-19 08:51:09