How to manipulate the DOM before in-page scripts are executed?

Question

Using Puppeteer , how can I run a script in the page context, with the full DOM available, before the in-page JS is executed?

For example, how can I run the following script to remove alt attributes from img elements, before any of the page JS is run?

document.querySelectorAll('img[alt]').forEach(
  e => e.removeAttribute('alt')
)

( page.evaluateOnNewDocument looks like it would be useful, but it appears to be executed before the page content is available--at the point at which it runs, the page is blank.)

Answer 1

I think the way to achieve what you are looking for is to perform:

set page.setJavaScriptEnabled(false)
enter the page
extract all the scripts and HTML without scripts
set page.setJavaScriptEnabled(true)
enter page.goto(`data:text/html,${HTMLWithoutScript}`) with HTML from step 3
execute your scripts
incject original extracted scripts page.addScriptTag({ content: script }) from step 3

Example

Here is a visualization of your problematic example:

const puppeteer = require('puppeteer');

const html = `
<html>
    <head></head>
    <body>
        <img src="https://picsum.photos/200/300?image=1062" alt="dog ">
        <img src="https://picsum.photos/200/300?image=1072" alt="car ">
        <div class="alts">List of alts: </div>
        <script>
            const images = document.querySelectorAll('img');
            const altsContainer = document.querySelector('.alts');
            images.forEach(image => {
                const alt = image.getAttribute('alt') || 'missing alt ';
                altsContainer.insertAdjacentHTML('beforeend', alt);
            })
        </script>
    </body>
</html>`;

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(`data:text/html,${html}`);
    await page.evaluate(() => {
        document.querySelectorAll('img[alt]').forEach(
            e => e.removeAttribute('alt')
        )
    });
    await page.screenshot({ path: 'image.png' });
    await browser.close();
})();

This code produce:

So remove alts is not working here.

solution

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.setJavaScriptEnabled(false);
    await page.goto(`data:text/html,${html}`);
    const { script, HTMLWithoutScript } = await page.evaluate(() => {
        const script = document.querySelector('script').innerHTML;
        document.querySelector('script').innerHTML = '';
        const HTMLWithoutScript = document.body.innerHTML;
        return { script, HTMLWithoutScript }
    });

    await page.setJavaScriptEnabled(true);
    await page.goto(`data:text/html,${HTMLWithoutScript}`);
    await page.evaluate(() => {
        document.querySelectorAll('img[alt]').forEach(
            e => e.removeAttribute('alt')
        )
    });
    await page.addScriptTag({ content: script });
    await page.screenshot({ path: 'image.png' });
    await browser.close();
})();

This will produce results as you expect in a question:

Answer 2

You can move your script tags to body instead of head. Then using document onload event you can execute a script. According to MDN this event fires when an object has been loaded. Below is the example code

 function removeAlt(){ document.querySelectorAll('img[alt]').forEach((e)=>{ e.removeAttribute('alt'); }); }

 <body onload="removeAlt()"> <img src="http://placehold.it/64x64" alt="1"> <img src="http://placehold.it/64x64" alt="2"> </body>

Let me know whether this fits into your requirement, I tested and function is removing alt tags from image

How to manipulate the DOM before in-page scripts are executed?

Question

2 answers

solution1
6 2018-02-09 13:22:19

Example

solution

solution2
0 2018-02-05 10:47:58

How to manipulate the DOM before in-page scripts are executed?

Question

2 answers

solution1 6 2018-02-09 13:22:19

Example

solution

solution2 0 2018-02-05 10:47:58

solution1
6 2018-02-09 13:22:19

solution2
0 2018-02-05 10:47:58