简体   繁体   中英

Scraping multiple websites with the same theme

const PORT = 5000;
import express from "express";
import axios from "axios";
import cheerio from "cheerio";

const app = express();

const tomsHardware = "https://www.tomshardware.com/best-picks/best-gaming-mouse";
const pcGamer = "https://www.pcgamer.com/the-best-gaming-mouse/";

const requestOne = axios.get(tomsHardware);
const requestTwo = axios.get(pcGamer);

const mice = []

app.get('/', (req, res) => {
    res.json('Welcome to my climate change API!');
});

app.get('/mouse', (req, res) => {
    axios.all([requestOne, requestTwo])
        .then((response) => {
            const html = response.data;
            const $ = cheerio.load(html);

            $('.product__title').each(function (index, elem) {
                const title = $(this).text();
                mice.push({
                    title
                });
            });
            res.json(mice)
        }).catch((err) => console.log(err));
}); 

I am trying to scrape both of theses of these websites and I am getting "object is not iterable" also I am not very sure about scraping both of them as they are using the same theme as it appears and they use the same class name.

Your response is actually an array of two responses, so you'll need to loop over that array and parse each response's HTML separately:

app.get('/mouse', (req, res) => {
  axios.all([requestOne, requestTwo])
    .then(responses => {
      for (const response of responses) {
        const html = response.data;
        const $ = cheerio.load(html);
  
        $('.product__title').each(function () {
          mice.push({title: $(this)text()});
        });
      }

      res.json(mice);
    })
    .catch(err => console.log(err));
});

Note that const mice = [] is declared outside the handler, so on each request, this array will continually grow with repeated elements. You might want to move it into the request handler closure to rebuild it on every request.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM