I'm making a bash script to retrieve some html content, the command line is:
wget http://some_url.com -q -O -output.txt -o /dev/null
But when there are images in the page wget still "display" those with non-printable caracters.
Is there a way to tell to wget to not display those non-printable caracters ?
Cheers
ps: as a matter of fact, i can't do any grep on the "output.txt" as it is considered as a binary file (because of the non printable caracters)
You can try with thsi url for instance: https://www.offensive-security.com/pwbonline/icq.html
Usually HTML documents won't contain binary data. I can't reproduce this specific problem.
If it is just about to force grep
to search in files which would usually being skipped because grep
assumes they are binary, use --binary-files=text
:
wget -O- http://server.com/url | grep --binary-files=text 'foo.*bar'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.