简体   繁体   中英

How to loop through multiple elements based on class in Playwright JS test?

I am trying to write a Playwright JS test to scrape some values from a website.

Here is the HTML of the page I am trying to scrape:

<div class="pl-it">
   <div class="i_d">
      <dl>
         <dt class>Series:
         <dd>
            <a href=".....">Province A</a>
         </dd>
         </dt>
         <dt class>Catalog Codes:
         <dd>
            <strong>Mi</strong>
            <strong>CA 1x,</strong>
            "ca 17"
         </dd>
         </dt>
         <dt class>Variants:
         <dd><strong><a>Click to see variants</a></strong></dd>
         </dt>
      </dl>
   </div>
   <div class="i_d">
      <dl>
         <dt class>Series:
         <dd>
            <a href=".....">Province B</a>
         </dd>
         </dt>
         <dt class>Catalog Codes:
         <dd>
            <strong>Fu</strong>
            <strong>DE 2x,</strong>
            "pa 21"
         </dd>
         </dt>
         <dt class>Variants:
         <dd><strong><a>Click to see variants</a></strong></dd>
         </dt>
      </dl>
   </div>
</div>

As you can see, there are multiple divs that have class i_d , and inside those there are multiple dl tags.

Inside each dl tag, there is a pair of dt & dd tags.

Basically, what I am trying to do is log each dt value & each corresponding dd value to the console.

The final outcome should look something like this in the logs:

Series: Province A
Catalog Codes: Mi CA 1x, ca17
Variants: CLick to see variants

Series: Province B
Catalog Codes: Fu DE 2x, pa21
Variants: CLick to see variants

Below is my current output:

[
  {
    label: 'Series:',
    name: 'Province A'
  },
{
    label: 'Series:',
    name: 'Province B'
  },
]

As you can see, it is only printing out the first dt & dd values, not the remaining ones (ie Catalog Codes , etc.)

Here is my current Playwright JS code:

const { test, expect } = require('@playwright/test');

test('homepage has Playwright in title and get started link linking to the intro page', async ({ page }) => {
  await page.goto('https://colnect.com/en/stamps/list/country/38-Canada');

  await expect(page.locator('div#pageContent h1')).toContainText('Stamp catalog › Canada › Stamps')


  const books = await page.$$eval('div.i_d', all_items => {
    const data =[];
    all_items.forEach(book => {
      const label = book.querySelector('dt')?.innerText;
      const name = book.querySelector('dd')?.innerText;
      data.push({ label, name});
    })
    return data;
  });  
  console.log(books);
});

Can someone please tell me how I can access each dt & dd rather than just the first one in each group?

The dom is a little complicated and had to use nested loops to get the format you are looking for.

test.describe('Scrap', async () => {
  test('Stamps', async ({ page }) => {
    await page.goto('https://colnect.com/en/stamps/list/country/38-Canada');
    await page.waitForLoadState('networkidle');
    await expect(page.locator('div#pageContent h1')).toContainText('Stamp catalog › Canada › Stamps');

    const scrappedStampData = await page.$$eval('div.i_d', (stamps) => {
      let stampsArray = [];
      let stampObject = {};
      stamps.forEach(async (stamp) => {
        stamp.querySelectorAll('dt').forEach((row) => {
          const rowLabel = row.innerText;
          const rowValue = row.nextElementSibling.innerText;
          stampObject[rowLabel] = rowValue;
        });
        stampsArray.push(stampObject);
        stampObject = {};
      });
      return stampsArray;
    });
    scrappedStampData.forEach((stampData, ind) => {
      console.log(`\n**************Stamp: ${ind + 1}*****************\n`);
      for (var key in stampData) {
        console.log(key + ' ' + stampData[key]);
      }
    });
  });
});

Output:

**************Stamp: 1*****************

Series: Province of Canada Pence Issue (imperforate)
Catalog codes: Mi:CA 1x, Sn:CA 8, Yt:CA 4, Sg:CA 17
Variants: Click to see variants
Themes: Crowns and Coronets | Famous People | Heads of State | Queens | Royalty | Women
Issued on: 1857-08-01
Colors: Rose
Printers: Rawdon, Wright, Hatch & Edson
Format: Stamp
Emission: Definitive
Perforation: Imperforate
Printing: Recess
Paper: machine-made medium to thick wove
Face value: ½ d - Canadian penny
Print run: 2,600,000
Score: 95% Accuracy: High
Buy Now: Find similar items on eBay

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM