So I am trying to make a scrape just two elements but from more than only one website (in this case is PS Store). Also, I'm trying to achieve it in the easiest way possible. Since I'm rookie in JS, please be gentle ;) Below my script. I was trying to make it happen with a for loop but with no effect (still it got only the first website from the array). Thanks a lot for any kind of help.
const puppeteer = require("puppeteer");
async function scrapeProduct(url) {
const urls = [
"https://store.playstation.com/pl-pl/product/EP9000-CUSA09176_00-DAYSGONECOMPLETE",
"https://store.playstation.com/pl-pl/product/EP9000-CUSA13323_00-GHOSTSHIP0000000",
];
const browser = await puppeteer.launch();
const page = await browser.newPage();
for (i = 0; i < urls.length; i++) {
const url = urls[i];
const promise = page.waitForNavigation({ waitUntil: "networkidle" });
await page.goto(`${url}`);
await promise;
}
const [el] = await page.$x(
"/html/body/div[3]/div/div/div[2]/div/div/div[2]/h2"
);
const txt = await el.getProperty("textContent");
const title = await txt.jsonValue();
const [el2] = await page.$x(
"/html/body/div[3]/div/div/div[2]/div/div/div[1]/div[2]/div[1]/div[1]/h3"
);
const txt2 = await el2.getProperty("textContent");
const price = await txt2.jsonValue();
console.log({ title, price });
browser.close();
}
scrapeProduct();
In general, your code is quite okay. Few things should be corrected, though:
const puppeteer = require("puppeteer");
async function scrapeProduct(url) {
const urls = [
"https://store.playstation.com/pl-pl/product/EP9000-CUSA09176_00-DAYSGONECOMPLETE",
"https://store.playstation.com/pl-pl/product/EP9000-CUSA13323_00-GHOSTSHIP0000000",
];
const browser = await puppeteer.launch({
headless: false
});
for (i = 0; i < urls.length; i++) {
const page = await browser.newPage();
const url = urls[i];
const promise = page.waitForNavigation({
waitUntil: "networkidle2"
});
await page.goto(`${url}`);
await promise;
const [el] = await page.$x(
"/html/body/div[3]/div/div/div[2]/div/div/div[2]/h2"
);
const txt = await el.getProperty("textContent");
const title = await txt.jsonValue();
const [el2] = await page.$x(
"/html/body/div[3]/div/div/div[2]/div/div/div[1]/div[2]/div[1]/div[1]/h3"
);
const txt2 = await el2.getProperty("textContent");
const price = await txt2.jsonValue();
console.log({
title,
price
});
}
browser.close();
}
scrapeProduct();
{ headless: false }
. This allows you to see what actually happens in the browser.networkidle
in latest version from npm
. You should use networkidle0
or networkidle2
instead.html/body/div...
. This might be subjective, but I think standard JS/CSS selectors are more readable: body > div ...
. But, well, if it works...Code above yields the following in my case:
{ title: 'Days Gone™', price: '289,00 zl' }
{ title: 'Ghost of Tsushima', price: '289,00 zl' }
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.