简体   繁体   中英

How to use CSS selectors deterministically with puppeteer?

I am trying to customise a puppeteer script that plays a song on soundcloud and records it. Using a CSS selector I would like to print the song duration as well. I can't seem to get the CSS selector to work. The url I am working with is https://soundcloud.com/octasine/octasine-audio-example-1

I have a working CSS selector now and can grab the minutes and seconds from the page. The challenge I am seeing is that sometimes the page hasn't finished rendering and I get an empty array return using await page.waitForNavigation(); causes the promise to just fail.

What am I missing to get puppeteer to work more reliably?

This is how I am using the CSS selector:

    const work = async () => {
        const inputsValues = [];
        const inputElements = await page.$$('span.sc-visuallyhidden');
        
        for (const element of inputElements) {
                let inputValue;
        
                inputValue = await element.getProperty('innerText');
                inputValue = await inputValue.jsonValue();
                if (inputValue.includes('Duration')){
                    console.log("DURATION");
                    mins = inputValue.split(" ")[1];
                    secs = inputValue.split(" ")[3];
                    console.log(mins);
                    console.log(secs);
                    console.log(inputValue);
                }
        
            inputsValues.push(inputValue);

        }
    
        console.log(inputsValues)
    }
    await work();

My complete script example.js :

// example.js -- node version v14.17.2 -- dependency installed with npm i puppeteer-stream 
const { launch, getStream }  = require("puppeteer-stream");
const fs = require("fs");
const { Console } = require("console");

const file = fs.createWriteStream(__dirname + "/test.webm");

async function test() {
    const browser = await launch();

    const page = await browser.newPage();
    await page.goto("https://soundcloud.com/octasine/octasine-audio-example-1");

    // await page.waitForNavigation();
    
    let html_var = await page.content();
    // Write the file
    fs.writeFile("example.html", html_var, function (err) {

    // Checks if there is an error
    if (err) return console.log(err);
    });
    console.log("Wrote html to example.html");


    // await page.click("//a[contains(text(), 'Play')]");
    await page.evaluate(() => {
        let elements = document.getElementsByClassName('snippetUXPlayButton');
        for (let element of elements)
            element.click();
    });

    const work = async () => {
        const inputsValues = [];
        const inputElements = await page.$$('span.sc-visuallyhidden');
        
        for (const element of inputElements) {
                let inputValue;
        
                inputValue = await element.getProperty('innerText');
                inputValue = await inputValue.jsonValue();
                if (inputValue.includes('Duration')){
                    console.log("DURATION");
                    mins = inputValue.split(" ")[1];
                    secs = inputValue.split(" ")[3];
                    console.log(mins);
                    console.log(secs);
                    console.log(inputValue);
                }
        
            inputsValues.push(inputValue);

        }
    
        console.log(inputsValues)
    }
    await work();


    let page_url = await page.url();
    console.log(page_url)


    
    await page.evaluate(() => {
        let elements = document.getElementsByClassName('sc-visuallyhidden');
        for (let element of elements)
            console.log(element.innerHTML);
    });

    const stream = await getStream(page, { audio: true, video: true });
    console.log("recording");

    stream.pipe(file);
    setTimeout(async () => {
        await stream.destroy();
        file.close();
        console.log("finished");
        browser.close();
    }, 1000 * 5 + mins * 60000 + secs * 1000);

}

test();

Script based on example script from https://www.npmjs.com/package/puppeteer-stream

The elements with span.sc-visuallyhidden selectors are filled into the DOM dynamically one by one, hence the length of $$('span.sc-visuallyhidden') grows as the page loads. At the moment when you populate your inputElements array it may not contains the Duration yet.

To make 100% sure it will be available in your set of elements you need to wait until it is rendered into the DOM. Eg by grabbing its exact selector:

await page.waitForSelector('.playbackTimeline__duration > span.sc-visuallyhidden')

I suggest refactoring your work() function as a page.$$eval method like this:

const inputsValues = await page.$$eval('span.sc-visuallyhidden', elems => elems.map(el => el.innerText))

Output is:

8 months ago, 2,452 plays, View all likes, View all reposts, 10 followers, 2 tracks, 414 plays, View all likes, View all comments, Current time: 0 seconds, Duration: 2 minutes 26 seconds, Current track: Octasine Audio Example 1

...that contains: Duration: 2 minutes 26 seconds you can process to mins and secs like before.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM