简体   繁体   中英

How to get links from multiple pages in a single array

I have a working code that successfully obtains all product links from multiple pages that have at least a 20% discount. The only problem is that it returns links in the arrays for each page separately. However, I would like it to return links for all pages in a single array and then transfer them to another function. I tried to create a string var all_links = [] and push all the links from each page into it and then return them like return all_links, as I know from a simpler example. However, I have not been successful in this case because I have no experience with coding. I started learning the basics three weeks ago. I would be very grateful if you could help me with the whole code as I don't have the necessary prior knowledge.

const puppeteer = require('puppeteer')
const minDiscount = 20;

async function getLinks() {
    const browser = await puppeteer.launch({
        headless: false,
        defaultViewport: null,
    });
    const page = await browser.newPage();

    const url = 'https://www.mytoys.de/spielzeug-spiele/holz/';

    await page.goto(url);

    // getting all the products, this will return an array of ElementHandle
    while(await page.$(".pager__link--next")){
        await page.waitForSelector(".pager__link--next")
        await page.waitForTimeout(1000);
        await page.click('.pager__link--next')
        await page.waitForTimeout(1500);
        const products = await page.$$('.prod-grid.js-prod-grid .prod-grid__item.js-prod-grid_item');
        const proms = await Promise.allSettled(
            products.map(async (prod) => {
                // searching for a discount on each product
                const disc = await prod.$$eval(
                    '.prod-grid.js-prod-grid .prod-flag.prod-flag-sale',
                    (discount) =>
                        discount.map((discItem) =>
                            discItem.innerText.replace(/[^0-9.]/g, '').replace(/\D+/g,'0')
                        )
                );
                // if it has a discount
                if (disc.length > 0) {
                    // we parse the discount to Integer type to compare it to minDiscount
                    const discountInt = parseInt(disc[0], 10);
                    if (discountInt >= minDiscount) {
                        // we get the link of the product
                        const link = await prod.$$eval('.prod-grid.js-prod-grid .prod-tile__link.js-prodlink', (allAs) => allAs.map((a) => a.href));
                        if (link.length > 0) {
                            // push an object containing the discount and the link of the product
                            return link[0];
                        }
                    }
                }
                return null;
            })
        );
        const bulkArray = proms.map((item) => {
            if (item.status === 'fulfilled') return item.value;
        });
        const endArray = bulkArray.filter(item => item !== null);
        console.log(endArray);
    }
}
    
getLinks();

An example of the result I am currently obtaining

[
  'https://www.mytoys.de/erzi-kinderwurst-sortiment-spiellebensmittel-6749036.html',
  'https://www.mytoys.de/chr-tanner-spiellebensmittel-wurststaender-1031946.html',
  'https://www.mytoys.de/hape-xylophon-und-hammerspiel-2503719.html',
  'https://www.mytoys.de/erzi-kinderparty-spiellebensmittel-6749035.html',
]
[
  'https://www.mytoys.de/brio-holzeisenbahnset-landleben-5501952.html',
  'https://www.mytoys.de/brio-brio-33277-bahn-ir-reisezug-set-4592516.html',
  'https://www.mytoys.de/brio-parkhaus-strassen-schienen-3175226.html',
  'https://www.mytoys.de/mytoys-steckwuerfel-12-tlg-11389814.html',
  'https://www.mytoys.de/brio-schienen-und-weichensortiment-1758325.html',
]
[
  'https://www.mytoys.de/hape-grosser-baukran-4141517.html',
  'https://www.mytoys.de/noris-mein-buntes-tuermchenspiel-3421170.html',
  'https://www.mytoys.de/goki-ziehtier-schaf-suse-2488933.html',
  'https://www.mytoys.de/eichhorn-colorsoundzug-mit-licht-1521635.html',
]

An example of the result you would like to obtain

[
  'https://www.mytoys.de/erzi-kinderwurst-sortiment-spiellebensmittel-6749036.html',
  'https://www.mytoys.de/chr-tanner-spiellebensmittel-wurststaender-1031946.html',
  'https://www.mytoys.de/hape-xylophon-und-hammerspiel-2503719.html',
  'https://www.mytoys.de/erzi-kinderparty-spiellebensmittel-6749035.html',
  'https://www.mytoys.de/brio-holzeisenbahnset-landleben-5501952.html',
  'https://www.mytoys.de/brio-brio-33277-bahn-ir-reisezug-set-4592516.html',
  'https://www.mytoys.de/brio-parkhaus-strassen-schienen-3175226.html',
  'https://www.mytoys.de/mytoys-steckwuerfel-12-tlg-11389814.html',
  'https://www.mytoys.de/brio-schienen-und-weichensortiment-1758325.html',
  'https://www.mytoys.de/hape-grosser-baukran-4141517.html',
  'https://www.mytoys.de/noris-mein-buntes-tuermchenspiel-3421170.html',
  'https://www.mytoys.de/goki-ziehtier-schaf-suse-2488933.html',
  'https://www.mytoys.de/eichhorn-colorsoundzug-mit-licht-1521635.html',
]
  1. Declare new variable for links collecting before your loop:
const allLinks = []; // <--
while(await page.$(".pager__link--next")){ ... }
  1. Push all links into it:
...
const endArray = bulkArray.filter(item => item !== null);
console.log(endArray);
allLinks.push(endArray); // <--
  1. Return / log result after loop execution:
async function getLinks() {
  ...
  return allLinks.flat(); // <--
}

console.log(await getLinks()) // result array

Refs: Array.prototype.flat()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM