简体   繁体   English

在JSDOM中加载URL时出现“访问被拒绝”

[英]Getting “Access Denied” when loading a URL in JSDOM

I am trying to scrape some information of a page using the jsdom.env function. 我正在尝试使用jsdom.env函数抓取页面的某些信息。 However, the page that gets returned in the env() callback is about how access is denied to the server instead of the content that I am hoping to see when I load the same URL in a browser. 但是,在env()回调中返回的页面是关于如何拒绝对服务器的访问,而不是关于在浏览器中加载相同URL时希望看到的内容。

Thus, there seems to be a difference in how the browser loads the page vs. how jsdom is loading it. 因此,浏览器加载页面和jsdom加载页面的方式似乎有所不同。 Is this something which can be configured in the jsdom module? 这可以在jsdom模块中配置吗?

Edit: 编辑:

Example URL: http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209 示例网址: http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209 : http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209 1218257754431& http://www.bestbuy.com/site/HP+-+20%22+Widescreen+Flat-Panel+LCD+Monitor/1422209.p?id=1218257754431&skuId=1422209

Update: 更新:

The issue was jsdom not specifying the user-agent http header. 问题是jsdom没有指定用户代理http标头。 Look at the detailed answer below 看下面的详细答案

The problem was that jsdom is not specifying a 'User-Agent' http header, which the bestbuy.com server are checking for. 问题在于jsdom没有指定bestbuy.com服务器正在检查的“ User-Agent” http标头。 If its empty, access is denied. 如果为空,则拒绝访问。 Currently, there is no way of specifying this through jsdom - https://github.com/tmpvar/jsdom/issues/196 当前,无法通过jsdom进行指定jsdom : //github.com/tmpvar/jsdom/issues/196

A workaround that worked for me to use the request module to get the page content and then pass then on to jsdom to work on. 对我来说,一种解决方法是使用request模块来获取页面内容,然后将其传递给jsdom进行处理。 The request module allows you to specify a user agent request模块允许您指定用户代理

Example: 例:

var request = require('request'),

getPage = function(someUri, callback) {
  request({uri: someUri, headers:{'User-Agent': 'Mozilla/5.0'}}, function (error, response, body) {
    console.log("Fetched " +someUri+ " OK!");
    callback(body);
  });
}

getPage('http://www.bestbuy.com/', function(body) {
   console.log(body)
});

By default, cross-domain AJAX calls are not possible. 默认情况下,跨域AJAX调用是不可能的。 More info here: http://m.snook.ca/archives/javascript/cross_domain_aj 此处提供更多信息: http : //m.snook.ca/archives/javascript/cross_domain_aj

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM