简体   繁体   English

使用 Puppeteer 进行函数和网页抓取

[英]Functions and Web-scraping with Puppeteer

The goal is to launch the html page, enter a url from booking.com, click the button, and have the scraped hotel name, rating, etc returned in the console.目标是启动 html 页面,从预订中输入 url.com,单击按钮,然后在控制台中返回抓取的酒店名称、评级等。

So far, it does not return anything when clicking the button.到目前为止,单击按钮时它没有返回任何内容。 It works when the URL is hard-coded, but It says "main is declared but value is never read" in this form.当 URL 被硬编码时,它可以工作,但它以这种形式表示“声明了 main,但从不读取值”。 Am i calling the function incorrectly?我是否错误地调用了 function? I'm still new to puppeteer, perhaps I'm overlooking something?我对 puppeteer 还是很陌生,也许我忽略了一些东西?

Here is app.js这是 app.js

function main()
{
    var Url = document.getElementById('inputUrl').value

    const puppeteer = require('puppeteer');

    let bookingUrl = Url;
    (async () => {
        const browser = await puppeteer.launch({ headless: true });
        const page = await browser.newPage();
        await page.goto(bookingUrl);

        // get hotel details
        let hotelData = await page.evaluate(() => {
            let hotels = [];
            // get the hotel elements
            let hotelsElms = document.querySelectorAll('div.sr_property_block[data-hotelid]');
            // get the hotel data
            hotelsElms.forEach((hotelelement) => {
                let hotelJson = {};
                try {
                        hotelJson.name = hotelelement.querySelector('span.sr-hotel__name').innerText;
                        hotelJson.reviews = hotelelement.querySelector('div.bui-review-score__text').innerText;
                        hotelJson.rating = hotelelement.querySelector('div.bui-review-score__badge').innerText;
                        if(hotelelement.querySelector('div.bui-price-display__value.prco-inline-block-maker-helper'))
                        {
                            hotelJson.price = hotelelement.querySelector('div.bui-price-display__value.prco-inline-block-maker-helper').innerText;
                        }
                        hotelJson.imgUrl = hotelelement.querySelector('img.hotel_image').attributes.src.textContent;
                    }
                    catch (exception){

                    }
                hotels.push(hotelJson);
            });
            return hotels;
        });
        console.dir(hotelData);
    })();
}

Here is index.html这是索引。html

<!DOCTYPE html>

 <html lang="en">
    <head>
        <meta charset="utf-8">
        <script src="app.js" type="text/javascript"></script> 
        <title></title>
        <meta name="description" content="">
        <link rel="stylesheet" href="">
    </head>

    <input id = "inputUrl" type="text" placeholder = "type url here"/>
    <button id = "button" button onclick="main();"> click</button>

    <body>

        <script src="" async defer></script>
    </body>
</html>

You could add this before the evaluate:您可以在评估之前添加它:

await page.waitForSelector('div.sr_property_block[data-hotelid]');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM