[英]Getting “Access Denied” when loading a URL in JSDOM
I am trying to scrape some information of a page using the jsdom.env
function. 我正在尝试使用
jsdom.env
函数抓取页面的某些信息。 However, the page that gets returned in the env()
callback is about how access is denied to the server instead of the content that I am hoping to see when I load the same URL in a browser. 但是,在
env()
回调中返回的页面是关于如何拒绝对服务器的访问,而不是关于在浏览器中加载相同URL时希望看到的内容。
Thus, there seems to be a difference in how the browser loads the page vs. how jsdom is loading it. 因此,浏览器加载页面和jsdom加载页面的方式似乎有所不同。 Is this something which can be configured in the
jsdom
module? 这可以在
jsdom
模块中配置吗?
Edit: 编辑:
Example URL: http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209
示例网址:
http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209
: http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209
1218257754431& http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209
Update: 更新:
The issue was jsdom not specifying the user-agent http header. 问题是jsdom没有指定用户代理http标头。 Look at the detailed answer below
看下面的详细答案
The problem was that jsdom
is not specifying a 'User-Agent' http header, which the bestbuy.com server are checking for. 问题在于
jsdom
没有指定bestbuy.com服务器正在检查的“ User-Agent” http标头。 If its empty, access is denied. 如果为空,则拒绝访问。 Currently, there is no way of specifying this through
jsdom
- https://github.com/tmpvar/jsdom/issues/196 当前,无法通过
jsdom
进行指定jsdom
: //github.com/tmpvar/jsdom/issues/196
A workaround that worked for me to use the request
module to get the page content and then pass then on to jsdom
to work on. 对我来说,一种解决方法是使用
request
模块来获取页面内容,然后将其传递给jsdom
进行处理。 The request
module allows you to specify a user agent request
模块允许您指定用户代理
Example: 例:
var request = require('request'),
getPage = function(someUri, callback) {
request({uri: someUri, headers:{'User-Agent': 'Mozilla/5.0'}}, function (error, response, body) {
console.log("Fetched " +someUri+ " OK!");
callback(body);
});
}
getPage('http://www.bestbuy.com/', function(body) {
console.log(body)
});
By default, cross-domain AJAX calls are not possible. 默认情况下,跨域AJAX调用是不可能的。 More info here: http://m.snook.ca/archives/javascript/cross_domain_aj
此处提供更多信息: http : //m.snook.ca/archives/javascript/cross_domain_aj
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.