简体   繁体   English

节点js,某些网站的请求正文为空

[英]Node js, Request body empty for certain websites

I'm experimenting with Node.js and web scraping. 我正在尝试使用Node.js和Web抓取。 In this case, I'm trying to scrape the most recent songs from a local radio station for display. 在这种情况下,我试图从本地广播电台抓取最新歌曲以进行显示。 With this particular website, body returns nothing. 有了这个特定的网站, body一无所获。 When I try using google or any other website, body has a value. 当我尝试使用Google或任何其他网站时, body具有价值。 Is this a feature of the website I'm trying to scrape? 这是我要抓取的网站的功能吗?

Here's my code: 这是我的代码:

var request = require('request');

var url = "http://www.radiomilwaukee.org";
request(url, function(err,resp,body) {
    if (!err && resp.statusCode == 200) {
        console.log(body);
    }
    else
    {
        console.log(err);
    }

}); });

That's weird, the website you're requesting doesn't seem to return anything unless the accept-encoding header is set to gzip . 太奇怪了,除非您将accept-encoding标头设置为gzip否则您请求的网站似乎未返回任何内容。 With that in mind, using this gist will work: https://gist.github.com/nickfishman/5515364 考虑到这一点,使用此要点将起作用: https : //gist.github.com/nickfishman/5515364

I ran the code within that gist, replacing the URL with "http://www.radiomilwaukee.org" and see the content within the sample.html file once the code has completed. 我在该要点中运行了代码,将URL替换为"http://www.radiomilwaukee.org"并在代码完成后看到了sample.html文件中的内容。

If you'd rather have access to the web page's content within the code, you could do something like this: 如果您希望在代码中访问网页的内容,则可以执行以下操作:

// ...

req.on('response', function(res) {
    var body, encoding, unzipped;

    if (res.statusCode !== 200) throw new Error('Status not 200');

    encoding = res.headers['content-encoding'];
    if (encoding == 'gzip') {
        unzipped = res.pipe(zlib.createGunzip());
        unzipped.on("readable", function() {
            // collect the content in the body variable
            body += unzipped.read().toString();
        });
    }

    // ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM