简体   繁体   English

Object 在承诺有时间解决之前返回

[英]Object is returned before promises have time to resolve

I am trying to build a web scraper to get informations about some products and store them inside a database.我正在尝试构建一个 web 刮板来获取有关某些产品的信息并将它们存储在数据库中。 I'm getting the HTML source code with Nightmare (because javascript code has to run on the server before the page content is created) then I'm parsing that source with Cheerio.我正在使用 Nightmare 获取 HTML 源代码(因为 javascript 代码必须在创建页面内容之前在服务器上运行)然后我使用 Cheerio 解析该源代码。 Once I do the parsing there are some images I have to download for the products.完成解析后,我必须为产品下载一些图像。 I have a simple download function and, based on if the image which I'm trying to download is available or not on the server, I'd like to return a string (or an array of strings) containing either the image name (which I downloaded) or a default image name from my computer.我有一个简单的下载 function 并且根据我尝试下载的图像在服务器上是否可用,我想返回一个包含图像名称的字符串(或字符串数组)(我下载的)或我计算机上的默认图像名称。 I tried calling the download function as a promise, I tried passing Promise.all() when I know there are multiple images to download, but to no avail.我尝试将下载 function 称为 promise,当我知道要下载多个图像时,我尝试传递 Promise.all(),但无济于事。 While I'm positive my code is working (the images are downloaded as should, the final object looks great at almost every property and value), it is the images properties fields which, when I'm printing the object to the console, still holds [Promise] / [ Promise { } ] and I'm not quite sure how to solve this matter.虽然我很肯定我的代码正在运行(图像已按原样下载,最终的 object 在几乎每个属性和值上看起来都很棒),但当我将 object 打印到控制台时,图像属性字段仍然持有 [Promise] / [ Promise { } ] 我不太确定如何解决这个问题。 I'm positive those promises resolve, but they're not resolved when I'm outputting the resulting object to the console.我很肯定这些承诺会解决,但是当我将生成的 object 输出到控制台时,它们并没有解决。 And that's a problem, 'cause I have to pass that object to be stored in the database and I don't think they'll be resolved then.这是一个问题,因为我必须通过将 object 存储在数据库中,我认为它们不会得到解决。

The code (minus the exact links) is down below:代码(减去确切的链接)如下:

const cheerio = require('cheerio')
const nightmare = require('nightmare')()
const download = require('image-downloader')

const settings = new function() {
    this.baseURL = 'https://baseurl.whatever'
    this.urlSearch = `${this.baseURL}/Product/Search?keyword=`
    this.urlVariant = 'https://cdn.baseurl.whatever/Variant/'
    this.urlProduct = 'https://cdn.baseurl.whatever/Product/'
    this.imgPath = './img/'
}

var review_id = 0

function downloadImage(url, filepath, success, error) {
    return download.image({ url, dest: filepath }).then(success, error)
}

const url = 'https://someurl.nevermind.meh/product?pid=50M3NUMB3R',
      code = '50M3C0D3'

async function scrapeProduct(code) {
    const product = await nightmare.goto(url)
        .wait()
        .evaluate(() => document.body.innerHTML)
        .end()
        .then(body => console.log(loadProduct(body, code)))
        .catch(err => console.log(`There was an error: [${err}]`))
}

function loadProduct(body, code) {
    $ = cheerio.load(body)

    return {
        title: $('li.LongName').text().trim(),
        category: $('a#categoryTitleLink').text().trim(),
        min_price: parseFloat($('span.MinPrice').text()),
        max_price: parseFloat($('span.MaxPrice')?.text()) || parseFloat($('span.MinPrice').text()),
        points: parseFloat($('div.AddtoCartUnderText span').text()),
        variants: [...$('div.productDetailClassicRnd')].map(variant => {
            const $field = $(variant).find('input'),
                  item_code = $field.attr('item_code')

            if (item_code.split('-')[0] == code) return null

            return {
                code: item_code.split('-')[0],
                title: $field.attr('item_name'),
                image: downloadImage(
                    `${settings.urlVariant}${item_code.replace(' ', '%20')}`,
                    `${settings.imgPath}${item_code}`,
                    result => result.filename.split('/').reverse()[0],
                    _ => 'variant_default-VC.jpg'
                )
            }
        }).filter(variant => variant !== null),
        images: [...$('img#imgProduct')].map(image => {
            const $image = $(image),
                  source = $image.attr('src')

            return downloadImage(
                source, 
                `${settings.imgPath}${source.split('/').reverse()[0]}`,
                result => result.filename.split('/').reverse()[0],
                _ => 'product_default.jpg'
            )
        }),
        other_images: [...$('img.productDetailOtherIMG')].map(image => {
            const $image = $(image),
                  source = $image.attr('src')

            // Check if the other image is not a default one
            if (/default_\d{1,2}/.test(source)) return null

            return downloadImage(
                source, 
                `${settings.imgPath}${source.split('/').reverse()[0]}`,
                result => result.filename.split('/').reverse()[0],
                _ => null
            )
        }).filter(other_image => other_image !== null),
        how_to_use: $('span#HowToUse p')?.text().trim() || "",
        technical_description: $('span#TechnicalDescription p')?.text().trim() || "",
        product_description: $('span#ProductDescription p')?.text().trim() || "",
        bought_with: [...$('a.redirectProductId')].map(item => $(item).attr('href').match(/=(\d+)$/)[1]),
        rank: $('div.productAverageMainDiv').find('i.activeStar').length,
        reviews_count: parseInt($('span#spnReviewCount').text()),
        reviews: [...$('div.customerReviewsMainDiv')].map(review => {
            const $review = $(review)

            return {
                id: ++review_id,
                author: $review.find('div.customerName').text().trim(),
                posted_at: $review.find('div.starIconsForReviews span').text().trim(),
                rank: $review.find('span.productAverageMainDiv').find('i.activeStar').length,
                message: $review.find('div.customerReviewDetail span').text().trim()
            }
        })
    }
}

scrapeProduct(code)

I can't even filter the null values from my resulting array of image names because those promises don't resolve once I reach the filter function.我什至无法从生成的图像名称数组中过滤 null 值,因为一旦我到达过滤器 function,这些承诺就无法解决。 I somehow was under the impression that不知何故,我的印象是

images: downloadImage(
    URL,
    filepath,
    resolve() {},
    reject() {}
)

will wait until the downloadImage function returns a value to the image property and then the filter function will be executed.将等到 downloadImage function 将值返回给 image 属性,然后过滤器 function 将被执行。 On the other hand, given that I guess the execution flows to filter function long before my downloadImage function has any chance of resolving the promise, I'd chain a.then() to the downloadImage, but I can't, because the downloadImage is inside the return of the map() function - which is the one followed by the.filter() function in the code.另一方面,考虑到我猜想执行流程过滤 function 早在我的 downloadImage function 有任何机会解决 ZB321DE3BDC299EC807E9F795D7D9DEEBImage,但我可以下载链接。是在 map() function 的返回内 - 这是代码中的.filter() function 后面的那个。

Any help would be much appreciated!任何帮助将非常感激! Thank you!谢谢!

PS: I'm pretty sure there's something elementary (logical) which I'm overseeing or I didn't understand properly and I apologize for wasting your time, but I'm struggling with this thing for two days now and I don't seem to have any more ideas ^_^ PS:我很确定我正在监督一些基本(逻辑)的事情,或者我没有正确理解,我为浪费你的时间而道歉,但我现在在这件事上苦苦挣扎了两天,我没有好像还有什么想法^_^

You are getting an array of Promises returned by the map() method, so you will need to use Promise.all() or one of its variants.您将获得 map() 方法返回的 Promise 数组,因此您需要使用Promise.all()或其变体之一。

For example, here you get the array of promises of "images", then you use Promise.all() to wait for all promises to be resolved, and finally you chain a then() to use the values.例如,在这里你得到“图像”的承诺数组,然后你使用 Promise.all() 等待所有承诺被解决,最后你链接一个 then() 来使用这些值。

    const imagesPromises = [...$('img#imgProduct')].map(image => {
        const $image = $(image),
                source = $image.attr('src')

        return downloadImage(
            source, 
            `${settings.imgPath}${source.split('/').reverse()[0]}`,
            result => result.filename.split('/').reverse()[0],
            _ => 'product_default.jpg'
        )
    })

    return Promise.all(imagesPromises)
        .then(images => {
            return {
                images,
                ...
            }
        })

And here you have a possible implementation:在这里你有一个可能的实现:

function loadProduct(body, code) {
    $ = cheerio.load(body)

    const result = {
        title: $('li.LongName').text().trim(),
        category: $('a#categoryTitleLink').text().trim(),
        min_price: parseFloat($('span.MinPrice').text()),
        max_price: parseFloat($('span.MaxPrice')?.text()) || parseFloat($('span.MinPrice').text()),
        points: parseFloat($('div.AddtoCartUnderText span').text()),
        variants: [...$('div.productDetailClassicRnd')].map(variant => {
            const $field = $(variant).find('input'),
                  item_code = $field.attr('item_code')

            if (item_code.split('-')[0] == code) return null

            return {
                code: item_code.split('-')[0],
                title: $field.attr('item_name'),
                image: downloadImage(
                    `${settings.urlVariant}${item_code.replace(' ', '%20')}`,
                    `${settings.imgPath}${item_code}`,
                    result => result.filename.split('/').reverse()[0],
                    _ => 'variant_default-VC.jpg'
                )
            }
        }).filter(variant => variant !== null),
        how_to_use: $('span#HowToUse p')?.text().trim() || "",
        technical_description: $('span#TechnicalDescription p')?.text().trim() || "",
        product_description: $('span#ProductDescription p')?.text().trim() || "",
        bought_with: [...$('a.redirectProductId')].map(item => $(item).attr('href').match(/=(\d+)$/)[1]),
        rank: $('div.productAverageMainDiv').find('i.activeStar').length,
        reviews_count: parseInt($('span#spnReviewCount').text()),
        reviews: [...$('div.customerReviewsMainDiv')].map(review => {
            const $review = $(review)

            return {
                id: ++review_id,
                author: $review.find('div.customerName').text().trim(),
                posted_at: $review.find('div.starIconsForReviews span').text().trim(),
                rank: $review.find('span.productAverageMainDiv').find('i.activeStar').length,
                message: $review.find('div.customerReviewDetail span').text().trim()
            }
        })
    }

    const imagesPromises = [...$('img#imgProduct')].map(image => {
        const $image = $(image),
                source = $image.attr('src')

        return downloadImage(
            source, 
            `${settings.imgPath}${source.split('/').reverse()[0]}`,
            result => result.filename.split('/').reverse()[0],
            _ => 'product_default.jpg'
        )
    })
    const otherImagesPromises = [...$('img.productDetailOtherIMG')].map(image => {
        const $image = $(image),
              source = $image.attr('src')

        // Check if the other image is not a default one
        if (/default_\d{1,2}/.test(source)) return null

        return downloadImage(
            source, 
            `${settings.imgPath}${source.split('/').reverse()[0]}`,
            result => result.filename.split('/').reverse()[0],
            _ => null
        )
    })

    return Promise.all(imagesPromises)
        .then(images => {
            result.images = images
            return Promise.all(otherImagesPromises)
        })
        .then(otherImages => {
            result.other_images = otherImages.filter(other_image => other_image !== null)
            return result
        }) 
}

scrapeProduct(code).then(product => console.log(product))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM