简体   繁体   中英

How to not display non-printable caracters with wget output?

I'm making a bash script to retrieve some html content, the command line is:

wget http://some_url.com -q -O -output.txt -o /dev/null

But when there are images in the page wget still "display" those with non-printable caracters.

Is there a way to tell to wget to not display those non-printable caracters ?

Cheers

ps: as a matter of fact, i can't do any grep on the "output.txt" as it is considered as a binary file (because of the non printable caracters)

You can try with thsi url for instance: https://www.offensive-security.com/pwbonline/icq.html

Usually HTML documents won't contain binary data. I can't reproduce this specific problem.

If it is just about to force grep to search in files which would usually being skipped because grep assumes they are binary, use --binary-files=text :

wget -O- http://server.com/url | grep --binary-files=text 'foo.*bar'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM