[英]async.series and async.each not working as expected
I am attempting to build a web scraper using nodeJS that searches a website's HTML for images, caches the image source URLs, then searches for the one with largest size. 我正在尝试使用nodeJS构建一个Web抓取工具,该工具在网站的HTML中搜索图像,缓存图像源URL,然后搜索最大尺寸的URL。
The problem I am having is deliverLargestImage()
is firing before the array of image source URLs is looped through to get their file sizes. 我遇到的问题是,在循环通过图像源URL的数组以获取其文件大小之前,正在触发
deliverLargestImage()
。 I am attempting to use both async.series
and async.each
to have this work properly. 我试图同时使用
async.series
和async.each
来使其正常工作。
How do I force deliverLargestImage()
to wait until the async.each
inside getFileSizes()
is finished? 如何强制
deliverLargestImage()
等到async.each
内getFileSizes()
完成?
JS JS
var async, request, cheerio, gm;
async = require('async');
request = require('request');
cheerio = require('cheerio');
gm = require('gm').subClass({ imageMagick: true });
function imageScraper () {
var imgSources, largestImage;
imgSources = [];
largestImage = {
url: '',
size: 0
};
async.series([
function getImageUrls (callback) {
request('http://www.example.com/', function (error, response, html) {
if (!error && response.statusCode === 200) {
var $ = cheerio.load(html);
$('img').each(function (i, elem) {
if ( $(this).attr('src').indexOf('http://') > -1 ) {
var src = $(this).attr('src');
imgSources.push(src);
}
});
}
callback();
});
},
function getFileSizes (callback) {
async.each(imgSources, function (img, _callback) {
gm(img).filesize(function (err, value) {
checkSize(img, value);
_callback();
});
});
callback();
},
function deliverLargestImage (callback) {
callback();
return largestImage;
}
]);
function checkSize (imgUrl, value) {
var r, raw;
if (value !== undefined) {
r = /\d+/;
raw = value.match(r)[0];
if (raw >= largestImage.size) {
largestImage.url = imgUrl;
largestImage.size = raw;
}
}
}
}
imageScraper();
Try moving the callback()
here: 尝试在此处移动
callback()
:
function getFileSizes (callback) {
async.each(imgSources, function (img, _callback) {
gm(img).filesize(function (err, value) {
checkSize(img, value);
_callback();
});
}, function(err){ callback(err); }); /* <-- put here */
/* callback(); <-- wrong here */
},
each
accepts a callback as a third parameter that gets executed when the inner loop over each element is finished: each
元素each
接受一个回调作为第三个参数 ,当每个元素的内部循环完成时将执行该回调:
Arguments
参数
arr
- An array to iterate over.arr
要迭代的数组。iterator(item, callback)
- A function to apply to each item inarr
.iterator(item, callback)
-一个应用于arr
每个项目的函数。 The iterator is passed acallback(err)
which must be called once it has completed.迭代器传递了一个
callback(err)
,完成后必须调用该callback(err)
。 If no error has occured, thecallback
should be run without arguments or with an explicitnull
argument.如果没有发生错误,则应在不使用参数或使用显式
null
参数的情况下运行callback
。callback(err)
- A callback which is called when alliterator
functions have finished, or an error occurs.callback(err)
-所有iterator
函数完成或发生错误时调用的回调。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.