[英]Node js, Request body empty for certain websites
I'm experimenting with Node.js and web scraping. 我正在尝试使用Node.js和Web抓取。 In this case, I'm trying to scrape the most recent songs from a local radio station for display.
在这种情况下,我试图从本地广播电台抓取最新歌曲以进行显示。 With this particular website,
body
returns nothing. 有了这个特定的网站,
body
一无所获。 When I try using google or any other website, body
has a value. 当我尝试使用Google或任何其他网站时,
body
具有价值。 Is this a feature of the website I'm trying to scrape? 这是我要抓取的网站的功能吗?
Here's my code: 这是我的代码:
var request = require('request');
var url = "http://www.radiomilwaukee.org";
request(url, function(err,resp,body) {
if (!err && resp.statusCode == 200) {
console.log(body);
}
else
{
console.log(err);
}
}); });
That's weird, the website you're requesting doesn't seem to return anything unless the accept-encoding
header is set to gzip
. 太奇怪了,除非您将
accept-encoding
标头设置为gzip
否则您请求的网站似乎未返回任何内容。 With that in mind, using this gist will work: https://gist.github.com/nickfishman/5515364 考虑到这一点,使用此要点将起作用: https : //gist.github.com/nickfishman/5515364
I ran the code within that gist, replacing the URL with "http://www.radiomilwaukee.org"
and see the content within the sample.html
file once the code has completed. 我在该要点中运行了代码,将URL替换为
"http://www.radiomilwaukee.org"
并在代码完成后看到了sample.html
文件中的内容。
If you'd rather have access to the web page's content within the code, you could do something like this: 如果您希望在代码中访问网页的内容,则可以执行以下操作:
// ...
req.on('response', function(res) {
var body, encoding, unzipped;
if (res.statusCode !== 200) throw new Error('Status not 200');
encoding = res.headers['content-encoding'];
if (encoding == 'gzip') {
unzipped = res.pipe(zlib.createGunzip());
unzipped.on("readable", function() {
// collect the content in the body variable
body += unzipped.read().toString();
});
}
// ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.