如何在节点服务器上使用 Puppeteer 并在前端 HTML 页面上获取结果？

Question

I'm just starting to learn Node and Puppeteer so forgiveness for being a noob in advance..我刚刚开始学习 Node 和 Puppeteer，所以请原谅我提前成为一个菜鸟..

I have a simple form on my index.html page and I want it to return images for an Instagram profile from a function on a NODE server running Puppeteer.我的 index.html 页面上有一个简单的表单，我希望它从运行 Puppeteer 的节点服务器上的函数返回 Instagram 个人资料的图像。 In the below code there is an Index.HTML file and an Index.JS file, in the Index.HTML file, when the button is clicked, I just want to call the server with an AJAX request passing in the username and running that function on the server, returning the result to the HTML file and putting the response text into the .images div (I can split the result and render img tags later)在下面的代码中有一个 Index.HTML 文件和一个 Index.JS 文件，在 Index.HTML 文件中，当单击按钮时，我只想通过传入用户名并运行该函数的 AJAX 请求调用服务器在服务器上，将结果返回到 HTML 文件并将响应文本放入 .images div（我可以稍后拆分结果并呈现 img 标签）

I have a couple questions:我有几个问题：

1: I am running the server.js with liveserver plugin in VSC, and it's running the file on http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js is that now the endpoint? 1：我在 VSC 中使用 liveserver 插件运行 server.js，它在http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js上运行文件，现在是端点吗？ How then do I pass the username to the server function.. In the headers or in the url?那么如何将用户名传递给服务器功能.. 在标题中还是在 url 中？ Can you show me?能给我看看么？

2: In my AJAX request in the Index.HTML file what does the request need to be to pass the username through to the server scrapeImages(username) function and get back what's returned? 2：在我在 Index.HTML 文件中的 AJAX 请求中，请求需要什么才能将用户名传递给服务器scrapeImages(username)函数并取回返回的内容？

. .

This is what I've tried in my index.html file:这是我在 index.html 文件中尝试过的：

       <body>
            <form>
                Username: <input type="text" id="username">&nbsp;&nbsp;
                <button id="clickMe" type="button" value="clickme" onclick="scrape(username.value);">
                Scrape Account Images</button>
            </form>

            <div class="images">
            </div>
        </body>

        <script>
            function scrape() {
                var xhttp = new XMLHttpRequest();
                xhttp.onreadystatechange = function() {
                    if (this.readyState == 4 && this.status == 200) {
                    document.querySelector(".images").innerHTML = this.responseText;
                    }
                };
                xhttp.open("GET", "http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js", true);
                xhttp.send();
            }


        </script>

This is my index.js file (works when I debug & with my username/pass):这是我的 index.js 文件（当我使用我的用户名/密码进行调试时有效）：

const puppeteer = require("puppeteer");
const fs = require('fs');

async function scrapeImages (username) {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('https://www.instagram.com/accounts/login/')

    await page.type('[name=username]','xxxxxx@gmail.com')
    await page.type('[name=password]','xxxxxx')

    await page.click('[type=submit]')
    await page.goto(`https://www.instagram.com/${username}`);

    await page.waitForSelector('img', {
        visible: true,
    })

    const data = await page.evaluate( () => {
        const images = document.querySelectorAll('img');
        const urls = Array.from(images).map(v => v.src + '||');
        return urls;
    } );


    fs.writeFileSync('./myData2.txt', data);


    return data;
}

Answer 1

You'll have to setup a node server, like express or anything else, and then pass the username by POST/GET method and catch the username with node/express.您必须设置一个节点服务器，例如 express 或其他任何东西，然后通过 POST/GET 方法传递用户名并使用 node/express 捕获用户名。 Then you can run the puppeteer with it.然后你可以用它运行木偶操纵者。

For an example, you have your node.js/express server running on port 8888. Your HTML would be like this:例如，您的 node.js/express 服务器在端口 8888 上运行。您的 HTML 将如下所示：

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
    <form method="post">
        Username: <input type="text" name="username" id="username">&nbsp;&nbsp;
        <button id="clickMe" type="button" value="clickme" onclick="getImages(this.form.username.value)">
        Scrape Account Images</button>
    </form>

    <div id="scrapedimages"></div>
    <script>
        let imgArray

        const getImages = (username) => {
            var xhttp = new XMLHttpRequest();
            xhttp.onreadystatechange = function () {
                if (this.readyState == 4 && this.status == 200) {
                    document.querySelector('#scrapedimages').innerHTML = ''
                    imgArray = JSON.parse(this.responseText)
                    if ( imgArray.images.length > 0 ) {
                        imgArray.images.split(',').forEach( function (source) {
                            var image = document.createElement('img')
                            image.src = source
                            document.querySelector('#scrapedimages').appendChild(image)
                        })
                    }
                }
            };
            xhttp.open('GET', 'http://127.0.0.1:8888/instascraper/user/' + username, true);
            xhttp.send();
        }
    </script>
</body>
</html>

Then in your node.js/server your script will be like this然后在您的 node.js/server 中，您的脚本将是这样的

const puppeteer = require('puppeteer')
const fs = require('fs-extra')
const express = require('express')
const app = express()
const port = 8888

const username = 'usernameInstaGram'
const password = 'passwordInstaGram'

;(async () => {

    app.get('/instascraper/user/:userID', async (request, response) => {
        const profile = request.params.userID
        const content = await scrapeImages (profile)
        response.set({
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Credentials': true,
            'Access-Control-Allow-Methods': 'POST, GET, PUT, DELETE, OPTIONS',
            'Access-Control-Allow-Headers': 'Content-Type',
            'Content-Type': 'text/plain'
        })

        response.send(content)
    })

    app.listen(port, () => {
        console.log(`Instascraper server listening on port ${port}!`)
    })

    const scrapeImages = async profile => {

        const browser = await puppeteer.launch()
        const [page] = await browser.pages()

        await page.goto('https://www.instagram.com/accounts/login/', {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('[name=username]', {timeout: 0})
        await page.type('[name=username]', username)
        await page.waitForSelector('[name=password]', {timeout: 0})
        await page.type('[name=password]',password)

        await Promise.all([
            page.waitForNavigation(),
            page.click('[type=submit]')
        ])

        await page.waitForSelector('input[placeholder="Search"]', {timeout: 0})
        await page.goto(`https://www.instagram.com/${profile}`, {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('body section > main > div > header ~ div ~ div > article a[href] img[srcset]', {visible:true, timeout: 0})

        const data = await page.evaluate( () => {
            const images = document.querySelectorAll('body section > main > div > header ~ div ~ div > article a[href] img[srcset]')
            const urls = Array.from(images).map(img => img.src )
            return urls;
        })

        await browser.close()

        return `{
            "images" : "${data}"
        }`
    }

})()

如何在节点服务器上使用 Puppeteer 并在前端 HTML 页面上获取结果？

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-12-25 06:03:12

如何在节点服务器上使用 Puppeteer 并在前端 HTML 页面上获取结果？

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-12-25 06:03:12

解决方案1
4 已采纳 2019-12-25 06:03:12