简体   繁体   English

NodeJS | 使用请求模块等待页面重定向

[英]NodeJS | Waiting for redirection of a page using Request module

When accessing http://www.oneblockdown.it/en/pharrell-williams-x-adidas-originals/footwear-lifestyle/men-unisex/pharrell-williams-x-adidas-originals-human-race-nmd-bb3070/4267 through a normal browser, it will often sit on a white page for 2.5 seconds, run some javascript, then automatically refresh the page and then it will load the actual product page. 访问http://www.oneblockdown.it/zh-CN/pharrell-williams-x-adidas-originals/footwear-lifestyle/men-unisex/pharrell-williams-x-adidas-originals-human-race-nmd-bb3070/通过普通的浏览器4267 ,它通常会在白页上停留2.5秒,运行一些javascript,然后自动刷新该页面,然后将加载实际的产品页面。

When I request the page using Node JS Request module (which uses the normal NodeJS http/https modules) it returns the html of the blank white page instead of waiting and showing the product page. 当我使用Node JS Request模块(使用普通的NodeJS http / https模块)请求页面时,它返回空白白页的html,而不是等待并显示产品页面。 The html of the blank looks as such: https://pastebin.com/ZSeXuZRc 空白的html如下所示: https : //pastebin.com/ZSeXuZRc

How would I go about allowing it to wait, execute the code as if its a browser and allowing it to refresh and show the product page? 我将如何允许它等待,像执行浏览器一样执行代码,并允许它刷新并显示产品页面?

Here is my current code: 这是我当前的代码:

    request(
    {
        url: 'https://www.oneblockdown.it/en/pharrell-williams-x-adidas-originals/footwear-lifestyle/men-unisex/pharrell-williams-x-adidas-originals-human-race-nmd-bb3070/4267',
        method: 'GET',
        headers: {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
            'Host': 'www.oneblockdown.it',
            'Cache-Control': 'max-age=0',
            'Upgrade-Insecure-Requests': '1',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'Referer': 'http://www.oneblockdown.it/en/footwear',
            'Accept-Encoding': 'gzip, deflate',
            'Accept-Language': 'en-US,en;q=0.9,la;q=0.8'

        }
    },
    function(err, res, body) {
        if(err || res.statusCode !== 200){
            console.log(`FAILED | ${err}`)
        } else {
          //parse the body here. It should return product page, but is returning the white html with javascript
        }
    });

Unfortunately, you cannot do so. 不幸的是,您不能这样做。 The response includes everything that the server returns when you send a request to that endpoint. 响应包括您向该端点发送请求时服务器返回的所有内容。 If there is javascript that gets invoked upon the DOM loading ie window.onload or document.ready which is the case for this webpage, you will not get those changes. 如果有javascript在DOM加载时被调用,例如window.onloaddocument.ready (此网页就是这种情况),那么您将不会获得这些更改。

This would be a problem even for a web crawler (that's why there are search engine optimization standards related to serving all important information without the use of any scripting and leaving UI related tasks such as styling and interactions once the page has loaded. 即使对于Web爬虫,这也将是一个问题(这就是为什么存在搜索引擎优化标准,该标准涉及在不使用任何脚本的情况下提供所有重要信息,并在页面加载后保留与样式和交互之类的UI相关任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM