简体   繁体   中英

Scraping website with node.js request and getting weird characters

I used nwjs (ver 0.18.8) and I made a request on mangafox.me to do a mangareader.

It works with http://mangafox.me/directory/

When I try to make a request on a manga image like this one http://mangafox.me/manga/onepunch_man/vTBD/c066/1.html I get these weird symbols:

{s F [ w#Y \\ AI (tY dϯ M%9 @ Cw ~ I(v ں ʑ y t k2z o y .^~wɌ e Ҳ ]?c Kf =v 0 3? y`Y _̘gY|fY \\ Q2 M nV iz g b$W _a c C5

How can I fix this?

Nevermind x) in fact it was just that the output was compressed in zip, so if you want to solve it if you have the same problem just add gzip: true in request header Ex:

request({url: '*****', gzip: true}, function(err, res, html){

   if (!error && response.statusCode == 200) {

   //Do something

   }

});

You don't need node.js for something this simple. The easiest way to scrape a site is to load it into a hidden iframe and then just loop through the document's arrays of elements you need.

The document loaded gives you everything in arrays like these...

 Frame.contentWindow.document.forms

 Frame.contentWindow.document.scripts

 Frame.contentWindow.document.styleSheets

 Frame.contentWindow.document.embeds

 Frame.contentWindow.document.cookie

 Frame.contentWindow.document.images

 Frame.contentWindow.document.links

And so forth...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM