[英]Scrape multiple websites using NodeJS, Express, Cherio and Axios
I would like to scrape multiple websites using NodeJS, Express, Cheerio and Axios. I'm able now to scrape 1 website and display the information to the HTML. But when I try to scrape multiple websites looking for the same element, it doesn't go through the forEach (stops after 1 cycle).我想使用 NodeJS、Express、Cheerio 和 Axios 抓取多个网站。我现在可以抓取 1 个网站并将信息显示到 HTML。但是当我尝试抓取多个网站寻找相同的元素时,它不会t go 通过 forEach(1 个循环后停止)。 Notice my loop which doesn't work correctly: urls.forEach(url => {
请注意我的循环无法正常工作: urls.forEach(url => {
2 files that are the most important: index.js 2 个最重要的文件:index.js
const PORT = 8000
const axios = require('axios')
const cheerio = require('cheerio')
const express = require('express')
const app = express()
const cors = require('cors')
app.use(cors())
const urls = ['https://www.google.nl','https://www.google.de']
// const url = 'https://www.heineken.com/nl/nl/'
app.get('/', function(req, res){
res.json('Robin')
})
urls.forEach(url => {
app.get('/results', (req, res) => {
axios(url)
.then(response => {
const html = response.data
const $ = cheerio.load(html)
const articles = []
$('script', html).each(function(){
const link = $(this).get()[0].namespace
if (link !== undefined) {
if (link.indexOf('w3.org') > -1) {
articles.push({
link
})
}
}
})
res.json(articles)
}).catch(err => console.log(err))
})
})
app.listen(PORT, () => console.log('server running on PORT ${PORT}'))
App.js:应用程序.js:
const root = document.querySelector('#root')
fetch('http://localhost:8000/results')
.then(response => {return response.json()})
.then(data => {
console.log(data)
data.forEach(article => {
const title = `<h3>` + article.link + `</h3>`
root.insertAdjacentHTML("beforeend", title)
})
})
You're registering multiple route handlers for the same route.您正在为同一条路线注册多个路线处理程序。 Express will only route requests to the first one. Express 只会将请求路由到第一个。 Move your URL loop inside app.get("/results", ...)
...将 URL循环移动到app.get("/results", ...)
...
app.get("/results", async (req, res, next) => {
try {
res.json(
(
await Promise.all(
urls.map(async (url) => {
const { data } = await axios(url);
const $ = cheerio.load(data);
const articles = [];
$("script", html).each(function () {
const link = $(this).get()[0].namespace;
if (link !== undefined) {
if (link.indexOf("w3.org") > -1) {
articles.push({
link,
});
}
}
});
return articles;
})
)
).flat() // un-nest each array of articles
);
} catch (err) {
console.error(err);
next(err); // make sure Express responds with an error
}
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.