Using Puppeteer , how can I run a script in the page context, with the full DOM available, before the in-page JS is executed?
For example, how can I run the following script to remove alt
attributes from img
elements, before any of the page JS is run?
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
( page.evaluateOnNewDocument
looks like it would be useful, but it appears to be executed before the page content is available--at the point at which it runs, the page is blank.)
I think the way to achieve what you are looking for is to perform:
page.setJavaScriptEnabled(false)
page.setJavaScriptEnabled(true)
page.goto(`data:text/html,${HTMLWithoutScript}`)
with HTML from step 3 page.addScriptTag({ content: script })
from step 3 Here is a visualization of your problematic example:
const puppeteer = require('puppeteer');
const html = `
<html>
<head></head>
<body>
<img src="https://picsum.photos/200/300?image=1062" alt="dog ">
<img src="https://picsum.photos/200/300?image=1072" alt="car ">
<div class="alts">List of alts: </div>
<script>
const images = document.querySelectorAll('img');
const altsContainer = document.querySelector('.alts');
images.forEach(image => {
const alt = image.getAttribute('alt') || 'missing alt ';
altsContainer.insertAdjacentHTML('beforeend', alt);
})
</script>
</body>
</html>`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This code produce:
So remove alts is not working here.
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.goto(`data:text/html,${html}`);
const { script, HTMLWithoutScript } = await page.evaluate(() => {
const script = document.querySelector('script').innerHTML;
document.querySelector('script').innerHTML = '';
const HTMLWithoutScript = document.body.innerHTML;
return { script, HTMLWithoutScript }
});
await page.setJavaScriptEnabled(true);
await page.goto(`data:text/html,${HTMLWithoutScript}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.addScriptTag({ content: script });
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This will produce results as you expect in a question:
You can move your script tags to body instead of head. Then using document onload event you can execute a script. According to MDN this event fires when an object has been loaded. Below is the example code
function removeAlt(){ document.querySelectorAll('img[alt]').forEach((e)=>{ e.removeAttribute('alt'); }); }
<body onload="removeAlt()"> <img src="http://placehold.it/64x64" alt="1"> <img src="http://placehold.it/64x64" alt="2"> </body>
Let me know whether this fits into your requirement, I tested and function is removing alt tags from image
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.