简体   繁体   English

在nodejs环境的aws lambda模块中提出请求时,将为两个特定域返回403

[英]When request is made in aws lambda module of nodejs environment, 403 is returned for two specific domains

When request is made in aws lambda module of nodejs environment, 403 is returned for two specific domains. 在nodejs环境的aws lambda模块中提出请求时,将为两个特定域返回403。 However, calling that domain alone will not return 403. The same applies to the request module, not the cralwer module. 但是,仅调用该域将不会返回403。这同样适用于请求模块,而不适用于征服者模块。

Executing the same logic locally returns all of the responses back to 200 normally. 在本地执行相同的逻辑通常会将所有响应返回到200。

This is the source uploaded to the lambda. 这是上传到lambda的源。

const Crawler = require('crawler');
const urls = [
  'http://www.ddengle.com',
  'http://www.cointalk.co.kr',
  'http://www.chaintalk.io',
  'http://www.coinpan.com',
  'http://www.hozaebox.com',
  'https://gall.dcinside.com/board/lists?id=bitcoins',
  'https://gall.dcinside.com/mgallery/board/lists?id=coin',
];

exports.handler = async (event) => {
  return new Promise(async (resolve) => {
    const crawler = new Crawler({
      maxConnections: 10,
      jQuery: 'whacko',
      callback(err, res, done) {
        if (err) throw err;
        const hostname = res.request.uri.hostname;

        if (res.statusCode === 200) {
          console.log(hostname);
        } else console.log(`[crawler] ${hostname} statusCode ${res.statusCode}`);

        done();
      },
    });

    crawler.on('drain', () => {
      resolve();
    });

    urls.forEach((e) => {
      crawler.queue([{
        headers: {
          'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36',
        },
        uri: e,
      }]);
    });
  });
};

For my purposes it is simply a request call, so all values ​​should be answered with 200. However, running the test on the lambda will return 403 unconditionally for either www.ddengle.com or coinpan.com. 出于我的目的,这只是一个请求调用,因此所有值都应以200回答。但是,对lambda运行测试将为www.ddengle.com或coinpan.com无条件返回403。

START RequestId: f3cc4977-11af-4ab1-9556-9b778efd1f72 Version: $LATEST
2019-08-23T08:13:36.593Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    www.chaintalk.io
2019-08-23T08:13:36.811Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    [crawler] www.ddengle.com statusCode 403
2019-08-23T08:13:37.170Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    www.hozaebox.com
2019-08-23T08:13:37.454Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    gall.dcinside.com
2019-08-23T08:13:37.873Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    gall.dcinside.com
2019-08-23T08:13:38.391Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    www.cointalk.co.kr
2019-08-23T08:13:39.153Z    f3cc4977-11af-4ab1-9556-9b778efd1f72    INFO    coinpan.com
END RequestId: f3cc4977-11af-4ab1-9556-9b778efd1f72

Kindly move your Lambda into a VPC, then Lambda request is made from a specific IP. 请将您的Lambda移至VPC,然后从特定IP发出Lambda请求。 I am assuming that when Lambda runs outside a VPC (Default behavior), there is no associated IP to it and it might be blocked by the website which you are trying to crawl. 我假设Lambda在VPC外部运行(默认行为)时,没有与其关联的IP,并且您尝试爬网的网站可能会将其阻止。

Domain ddengle.com seems to be protected by cloudflare and only accessible through https . ddengle.com域似乎受到cloudflare的保护,只能通过https访问。

curl -I http://www.ddengle.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 23 Aug 2019 08:31:50 GMT
Connection: keep-alive
Cache-Control: max-age=3600
Expires: Fri, 23 Aug 2019 09:31:50 GMT
Location: https://www.ddengle.com/
X-Content-Type-Options: nosniff
Server: cloudflare

It can be country block, WAF, browser integrity check ... 它可以是国家/地区阻止,WAF,浏览器完整性检查...

Full list here: https://community.cloudflare.com/t/community-tip-fixing-error-403-forbidden/53308 此处的完整列表: https : //community.cloudflare.com/t/community-tip-fixing-error-403-forbidden/53308

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM