简体   繁体   English

节点 - 发出 axios 请求,直到条件为真

[英]node - make axios requests until a condition is true

I'm trying to use a variable to call a method inside a while loop.我正在尝试使用变量在while循环中调用方法。 My objective is to fetch all the images until the has_next_page var is true.我的目标是获取所有图像,直到has_next_page为真。 This param is returned after each request from the online api.此参数在来自在线 api 的每个请求后返回。 Unfrotunately the code didn't work as expected不幸的是,代码没有按预期工作

class Scraper {
    constructor() {}

    question(inputText) {
        rl.setPrompt(inputText);
        rl.prompt();
        return new Promise( (resolve, reject) => {
            let answer;
            rl.on('line', (input) => {
                answer = input;
                rl.close();
            });
            rl.on('close', () => {
                resolve(answer);
            });
        })
    }

    startFetch(username) {
        this.username = String(username); 
        return new Promise( (resolve, reject) => {
            axios({
                url: `https://www.instagram.com/${this.usernamee}/?__a=1`
            }).then( (response) => {
                //response.data.graphql.user);
                userId = response.data.graphql.user.id;
                //totalMedia = response.data.graphql.user.edge_owner_to_timeline_media.count
                if( response.data.graphql.user.edge_owner_to_timeline_media.page_info.has_next_page ){
                    currCursor = response.data.graphql.user.edge_owner_to_timeline_media.page_info.end_cursor;
                }
                has_next_page = response.data.graphql.user.edge_owner_to_timeline_media.page_info.has_next_page;
                //let nodes = response.data.graphql.user.edge_owner_to_timeline_media.edges.length;
                response.data.graphql.user.edge_owner_to_timeline_media.edges.map( (item, index) => {
                    this.processLink(item.node.display_url, index);
                });
                resolve();
            });
        });
    }

    fetchNextPage(userId) {
        axios({
            method: 'GET',
            baseURL: 'https://www.instagram.com/graphql/query/',
            params: {
              query_hash: '42323d64886122307be10013ad2dcc44',
              variables: { 
                id: userId, 
                first: "12",  
                after: currCursor
              } 
            }
          }).then( (response) => {
            console.log(response.data.data.user.edge_owner_to_timeline_media.edges[0].node)
            //totalMedia = response.data.data.user.edge_owner_to_timeline_media.count
            if( response.data.data.user.edge_owner_to_timeline_media.page_info.has_next_page ){
                currCursor = response.data.graphql.user.edge_owner_to_timeline_media.page_info.end_cursor;
            }
          });
    }

    processLink(imageURI, n) {
        let filename = path.format({dir: destinationPath, base: `${n}.jpg`});
        let file = fs.createWriteStream(filename);
        https.get(imageURI, (res) => {
            res.pipe(file);
        });
    }
}

const ig = new Scraper();

// ref url: https://www.instagram.com/profile/?__a=1
// I thing that next code part is a bit messed up?
ig.question('Username: ').then( (profileURI) => {
// get the first cursor from ig api, if has_next_page is true fetch next page
    ig.startFetch(profileURI).then( () => {
//while will cause memory error ?
        while( has_next_page ){
            ig.fetchNextPage(userId);
        }
    });
});

I've created a simple class in my node cli script.我在节点 cli 脚本中创建了一个简单的 class。 How I can call the fetchNextPage() corectly until the defined variable is true?在定义的变量为真之前,如何正确调用fetchNextPage() I get a memory error我收到 memory 错误

<--- Last few GCs --->

[38139:0x105002a00]    66288 ms: Scavenge 2026.7 (2071.0) -> 2022.1 (2072.7) MB, 21.9 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66329 ms: Scavenge 2028.3 (2072.7) -> 2023.5 (2074.0) MB, 15.0 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66375 ms: Scavenge 2029.7 (2074.0) -> 2024.9 (2091.2) MB, 10.8 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 


<--- Last few GCs --->

[38139:0x105002a00]    66288 ms: Scavenge 2026.7 (2071.0) -> 2022.1 (2072.7) MB, 21.9 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66329 ms: Scavenge 2028.3 (2072.7) -> 2023.5 (2074.0) MB, 15.0 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66375 ms: Scavenge 2029.7 (2074.0) -> 2024.9 (2091.2) MB, 10.8 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 


<--- Last few GCs --->

[38139:0x105002a00]    66288 ms: Scavenge 2026.7 (2071.0) -> 2022.1 (2072.7) MB, 21.9 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66329 ms: Scavenge 2028.3 (2072.7) -> 2023.5 (2074.0) MB, 15.0 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66375 ms: Scavenge 2029.7 (2074.0) -> 2024.9 (2091.2) MB, 10.8 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 


<--- JS stacktrace --->


<--- Last few GCs --->

[38139:0x105002a00]    66288 ms: Scavenge 2026.7 (2071.0) -> 2022.1 (2072.7) MB, 21.9 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66329 ms: Scavenge 2028.3 (2072.7) -> 2023.5 (2074.0) MB, 15.0 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 
[38139:0x105002a00]    66375 ms: Scavenge 2029.7 (2074.0) -> 2024.9 (2091.2) MB, 10.8 / 0.0 ms  (average mu = 0.711, current mu = 0.384) allocation failure 


<--- JS stacktrace --->


<--- JS stacktrace --->

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory

<--- JS stacktrace --->

FATAL ERROR: MarkCompactCollector: young object promotion failed Allocation failed - JavaScript heap out of memory
Segmentation fault: 11

UPDATE更新

I've solved the first issue with contents fetching.我已经解决了内容获取的第一个问题。 Now I'm trying to face how to avoid a 429 server error to fetch each single image link unsing the processLink method.现在我正试图面对如何避免 429 服务器错误来获取每个单独的图像链接,而不使用processLink方法。 This is the error I get when the images url are fetched.这是我在获取图像 url 时遇到的错误。

node:events:353
      throw er; // Unhandled 'error' event
      ^

Error: getaddrinfo ENOTFOUND scontent-mxp1-1.cdninstagram.com
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:67:26)
Emitted 'error' event on ClientRequest instance at:
    at TLSSocket.socketErrorListener (node:_http_client:486:9)
    at TLSSocket.emit (node:events:376:20)
    at emitErrorNT (node:internal/streams/destroy:188:8)
    at emitErrorCloseNT (node:internal/streams/destroy:153:3)
    at processTicksAndRejections (node:internal/process/task_queues:80:21) {
  errno: -3008,
  code: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'scontent-mxp1-1.cdninstagram.com'
}

This is the reworked code这是重做的代码


#!/usr/bin/env node

const axios = require('axios');
const path = require('path');
const https = require('https');
const fs = require('fs');
const rl = require('readline').createInterface({
    input: process.stdin,
    output: process.stdout
});

const destinationPath = path.format({dir: __dirname, base: 'scraped-profiles'});
let collectedLinks = [];

class Scraper {

    constructor() {}

    question(inputText) {
        rl.setPrompt(inputText);
        rl.prompt();
        return new Promise( (resolve, reject) => {
            let answer;
            rl.on('line', (input) => {
                answer = input;
                rl.close();
            });
            rl.on('close', () => {
                resolve(answer);
            });
        })
    }

    startFetch(username) {
        this.username = String(username); 
        return axios({
                url: `https://www.instagram.com/${this.username}/?__a=1`
            }).then( (response) => {
                console.log(response.data.graphql);

                this.user_id = response.data.graphql.user.id;
                this.has_next_page = response.data.graphql.user.edge_owner_to_timeline_media.page_info.has_next_page;
                response.data.graphql.user.edge_owner_to_timeline_media.edges.map( (item) => {
                    collectedLinks.push(item.node.display_url);
                });
                if( this.has_next_page ){
                    this.currCursor = response.data.graphql.user.edge_owner_to_timeline_media.page_info.end_cursor;
                    this.fetchNextPage();
                } else {
                    console.log('Completed');
                }
            });
        
    }

    fetchNextPage() {
        return axios({
            method: 'GET',
            baseURL: 'https://www.instagram.com/graphql/query/',
            params: {
              query_hash: '42323d64886122307be10013ad2dcc44',
              variables: { 
                id: this.user_id, 
                first: "12",  
                after: this.currCursor
              } 
            }
        }).then( (response) => {
            console.log(response.data.data)

            if( typeof response.data.data !== 'undefined' ){
                if( response.data.data.user.edge_owner_to_timeline_media.page_info.has_next_page ){
                    this.currCursor = response.data.data.user.edge_owner_to_timeline_media.page_info.end_cursor;
                
                    response.data.data.user.edge_owner_to_timeline_media.edges.map( (item) => {
                        collectedLinks.push(item.node.display_url);
                    });
                    return this.fetchNextPage();
                }
            }
        });
    }

    processLink() {
        collectedLinks.forEach( (link, i) => {
            let filename = path.format({dir: destinationPath, base: `${i}.jpg`});
            let file = fs.createWriteStream(filename);
            https.get(link, (res) => {
                res.pipe(file);
            });
        });
    }
}

const ig = new Scraper();

// https://www.instagram.com/username/?__a=1

ig.question('Username: ').then( (answer) => {
    return ig.startFetch(answer).then( () => {
        return ig.fetchNextPage();
    });
}).then( () => {
    ig.processLink();
    console.log('All done');
});

The problem is axios requests are asynchronous and the while loop does not pause while the requests are made.问题是 axios 请求是异步的,并且在发出请求while循环不会暂停。 You have close to an infinite loop as a result and are probably making hundreds of requests.结果,您接近于无限循环,并且可能会发出数百个请求。

A better approach would be recursively call fetchNextPage() from within the axios.then until the has_next_page condition is met.更好的方法是从axios.then中递归调用fetchNextPage()直到满足has_next_page条件。 I think you have it backwards in question and want it to not be true, but that is an assumption.我认为你有问题,并希望它不是真的,但这是一个假设。

Basics using a bit of pseudo code for brevity:为简洁起见,使用一些伪代码的基础知识:

fetchNextPage(userId) {
      // return the axios promise so it can create promise chain
      return  axios({
            ....
          }).then( (response) => {
              // store current data from this response

              // decide what to do next
              if(!nextPageConditionMet){
                  // return another axios promise for another page to current `then()`
                  return this.fetchNextPage(userId)
              }else{
                 // finally return all the pages
                 return storedPageCollection
              }                
          });
    }


ig.question('Username: ').then((profileURI) => {
  // keep returning promises to each `then()
  return ig.startFetch(profileURI).then(() => {
    //Remove while loop

    // return promise that won't be resolved until the next page 
    // condition is met in recursive calls made inside the function 
    return ig.fetchNextPage(userId);

  });
}).then((pageCollection) => {
        // do something with  the final pageCollection
        // that was returned in final `then()` of `fetchNextPage()`
        console.log('All done!')
 }).catch(err => console.log('Ooops something failed in the chain', err))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM