[英]puppeteer - export JSON file from loop
The exported file contains only one url.导出的文件只包含一个 url。 The rest of the urls are not found in the exported file.
在导出的文件中找不到其余的 url。 How can I generate a file with all the entries in the loop?
如何生成包含循环中所有条目的文件?
const puppeteer = require("puppeteer");
const fs = require('fs');
let browser;
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const [page] = await browser.pages();
await page.goto('https://old.reddit.com/',{"waitUntil" : "networkidle0"});
const a_elems = await page.$$('.thumbnail');
for (var i=0; i<a_elems.length && i<3; i++) {
const elem = a_elems[i];
const href = await page.evaluate(e => e.href, elem);
const newPage = await browser.newPage();
await newPage.goto(href,{"waitUntil" : "networkidle0"});
const url = await newPage.evaluate(() => document.location.href);
console.log(url);
fs.writeFileSync('export.json', JSON.stringify(url));
}
await browser.close();
})()
;
Thanks!谢谢!
Create an array, push each url
onto it in the loop, then move your writeFile
call to the end.创建一个数组,在循环中将每个
url
推到它上面,然后将你的writeFile
调用移动到最后。
const puppeteer = require("puppeteer");
const fs = require('fs').promises;
let browser;
(async () => {
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const [page] = await browser.pages();
await page.goto('https://old.reddit.com/', {
"waitUntil": "networkidle0"
});
const aElems = await page.$$('.thumbnail');
const urls = [];
for (let i = 0; i < aElems.length && i < 3; i++) {
const href = await aElems[i].evaluate(e => e.href);
const newPage = await browser.newPage();
await newPage.goto(href, {waitUntil: "networkidle0"});
const url = await newPage.evaluate(() => document.location.href);
console.log(url);
urls.push(url);
}
await fs.writeFile('export.json', JSON.stringify(urls));
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Tips:提示:
writeFileSync
seems suboptimal here relative to the async version.writeFileSync
相对于异步版本在这里似乎不是最理想的。let
instead of var
so you don't get bit by i
breaking scope and popping up with a stale value outside (or inside) the loop block.let
而不是var
这样您就不会因为i
破坏范围并在循环块外部(或内部)弹出一个陈旧的值而受到影响。newPage.close();
newPage.close();
at the end of the loop."waitUntil": "networkidle0"
is really slow. "waitUntil": "networkidle0"
真的很慢。 Since all you're doing is accessing document.location.href
you can probably speed things up with waitUntil: "domcontentloaded"
.document.location.href
,因此您可以使用waitUntil: "domcontentloaded"
加快速度。camelCase
, not snake_case
. camelCase
,而不是snake_case
。elementHandle.evaluate(...)
rather than page.evaluate(..., elementHandle)
.elementHandle.evaluate(...)
而不是page.evaluate(..., elementHandle)
。catch
and clean up the browser
resource with finally
.catch
捕获错误并使用finally
清理browser
资源。let browser;
was pointless in your original code.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.