简体   繁体   English

JavaScript 迭代对象列表

[英]JavaScript Iterating List of Objects

I'm writing a scraper for Skyscanner just for fun.我正在为 Skyscanner 写一个刮板只是为了好玩。 What I'm trying to do is to iterate through the list of all listings, and for each listing, extract the URL.我要做的是遍历所有列表的列表,并为每个列表提取 URL。

在此处输入图像描述

What I've done so far is getting the listing $("div[class^='FlightsResults_dayViewItems']") which returns到目前为止,我所做的是获取返回的列表 $("div[class^='FlightsResults_dayViewItems']")

在此处输入图像描述

but I'm not sure how to iterate through the returned object and get the URL (/transport/flight/bos...).但我不确定如何遍历返回的 object 并获取 URL (/transport/flight/bos...)。 The pseudo code that I have is我拥有的伪代码是

for(listings in $("div[class^='FlightsResults_dayViewItems']")) {
     go to class^='EcoTickerWrapper_itineraryContainer' 
          go to class^='FlightsTicket_container'
               go to class^='FlightsTicket_link' and get the href and save in an array
}

How would I go about doing this?我将如何 go 这样做? Side-note, I'm using cheerio and jquery.旁注,我正在使用cheerio和jquery。

Update: I figured out the CSS selector is更新:我发现 CSS 选择器是

$("div[class^='FlightsResults_dayViewItems'] > div:nth-child(at_index_i) > div[class^='EcoTicketWrapper_itineraryContainer'] > div[class^='FlightsTicket_container'] > a[class^='FlightsTicket_link']").href

Now, I'm trying to figure out how to loop through the listing and apply the selector for each listing in the loop.现在,我试图弄清楚如何遍历列表并为循环中的每个列表应用选择器。

Also, it seems like not including the div:nth-child(at_index_i) won't work.此外,似乎不包括 div:nth-child(at_index_i) 将不起作用。 Is there a way around this?有没有解决的办法?

$("div[class^='FlightsResults_dayViewItems'] > div:nth-child(3) > div[class^='EcoTicketWrapper_itineraryContainer'] > div[class^='FlightsTicket_container'] > [class^='FlightsTicket_link']").attr("href")

"/transport/flights/bos/cun/210301/210331/config/10081-2103010815--32733-0-10803-2103011250|10803-2103311225--31722-1-10081-2103312125?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=27540602&inboundaltsenabled=false&infants=0&originentityid=27539525&outboundaltsenabled=false&preferdirects=false&preferflexible=false&ref=home&rtn=1"


$("div[class^='FlightsResults_dayViewItems'] > div[class^='EcoTicketWrapper_itineraryContainer'] > div[class^='FlightsTicket_container'] > [class^='FlightsTicket_link']").attr("href")

undefined

Here's the function to iterate the listings and grab the URLs for each listing.这是 function 迭代列表并获取每个列表的 URL。

async function scrapeListingUrl(listingURL) {
  try {
    const page = await browser.newPage();
    await page.goto(listingURL, { waitUntil: "networkidle2" });
    // await page.waitForNavigation({ waitUntil: "networkidle2" }); // Wait until page is finished loading before navigating
    console.log("Finished loading page.");

    const html = await page.evaluate(() => document.body.innerHTML);
    fs.writeFileSync("./listing.html", html);

    const $ = await cheerio.load(html); // Inject jQuery to easily get content of site more easily compared to using raw js

    // Iterate through flight listings
    // Note: Using regex to match class containing "FlightsResults_dayViewItems" to get listing since actual class name contains nonsense string appended to end.
    const bookingURLs = $('a[class*="FlightsTicket_link"]')
      .map((i, elem) => console.log(elem.href))
      .get();

    console.log(bookingURLs);
    return bookingURLs;
  } catch (error) {
    console.log("Scrape flight url failed.");
    console.log(error);
  }
}

Using map()使用地图()

const hrefs = $(selector).map((i, elem) => elem.href).get()

Looking at the code you are not using jQuery so above does not work.查看您未使用 jQuery 的代码,因此上述代码不起作用。 So you just need to use a basic selector that matches part of the class with querySelectorAll.因此,您只需使用一个基本选择器,该选择器将 class 的一部分与 querySelectorAll 匹配。 And map is used to grab the hrefs. map 用于抓取href。

const links = [...document.querySelectorAll('a[class*="FlightsTicket_link"]')]
    .map(l=>l.href)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM