简体   繁体   English

如何在节点服务器上使用 Puppeteer 并在前端 HTML 页面上获取结果?

[英]How to use Puppeteer on NODE server and get the results on frontend HTML page?

I'm just starting to learn Node and Puppeteer so forgiveness for being a noob in advance..我刚刚开始学习 Node 和 Puppeteer,所以请原谅我提前成为一个菜鸟..

I have a simple form on my index.html page and I want it to return images for an Instagram profile from a function on a NODE server running Puppeteer.我的 index.html 页面上有一个简单的表单,我希望它从运行 Puppeteer 的节点服务器上的函数返回 Instagram 个人资料的图像。 In the below code there is an Index.HTML file and an Index.JS file, in the Index.HTML file, when the button is clicked, I just want to call the server with an AJAX request passing in the username and running that function on the server, returning the result to the HTML file and putting the response text into the .images div (I can split the result and render img tags later)在下面的代码中有一个 Index.HTML 文件和一个 Index.JS 文件,在 Index.HTML 文件中,当单击按钮时,我只想通过传入用户名并运行该函数的 AJAX 请求调用服务器在服务器上,将结果返回到 HTML 文件并将响应文本放入 .images div(我可以稍后拆分结果并呈现 img 标签)

I have a couple questions:我有几个问题:

1: I am running the server.js with liveserver plugin in VSC, and it's running the file on http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js is that now the endpoint? 1:我在 VSC 中使用 liveserver 插件运行 server.js,它在http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js上运行文件,现在是端点吗? How then do I pass the username to the server function.. In the headers or in the url?那么如何将用户名传递给服务器功能.. 在标题中还是在 url 中? Can you show me?能给我看看么?

2: In my AJAX request in the Index.HTML file what does the request need to be to pass the username through to the server scrapeImages(username) function and get back what's returned? 2:在我在 Index.HTML 文件中的 AJAX 请求中,请求需要什么才能将用户名传递给服务器scrapeImages(username)函数并取回返回的内容?

. .

This is what I've tried in my index.html file:这是我在 index.html 文件中尝试过的:

       <body>
            <form>
                Username: <input type="text" id="username">&nbsp;&nbsp;
                <button id="clickMe" type="button" value="clickme" onclick="scrape(username.value);">
                Scrape Account Images</button>
            </form>

            <div class="images">
            </div>
        </body>

        <script>
            function scrape() {
                var xhttp = new XMLHttpRequest();
                xhttp.onreadystatechange = function() {
                    if (this.readyState == 4 && this.status == 200) {
                    document.querySelector(".images").innerHTML = this.responseText;
                    }
                };
                xhttp.open("GET", "http://127.0.0.1:5500/12_Puppeteer/12-scraping-instagram/index.js", true);
                xhttp.send();
            }


        </script>

This is my index.js file (works when I debug & with my username/pass):这是我的 index.js 文件(当我使用我的用户名/密码进行调试时有效):

const puppeteer = require("puppeteer");
const fs = require('fs');

async function scrapeImages (username) {
    const browser = await puppeteer.launch({ headless: false });
    const page = await browser.newPage();

    await page.goto('https://www.instagram.com/accounts/login/')

    await page.type('[name=username]','xxxxxx@gmail.com')
    await page.type('[name=password]','xxxxxx')

    await page.click('[type=submit]')
    await page.goto(`https://www.instagram.com/${username}`);

    await page.waitForSelector('img', {
        visible: true,
    })

    const data = await page.evaluate( () => {
        const images = document.querySelectorAll('img');
        const urls = Array.from(images).map(v => v.src + '||');
        return urls;
    } );


    fs.writeFileSync('./myData2.txt', data);


    return data;
}

You'll have to setup a node server, like express or anything else, and then pass the username by POST/GET method and catch the username with node/express.您必须设置一个节点服务器,例如 express 或其他任何东西,然后通过 POST/GET 方法传递用户名并使用 node/express 捕获用户名。 Then you can run the puppeteer with it.然后你可以用它运行木偶操纵者。

For an example, you have your node.js/express server running on port 8888. Your HTML would be like this:例如,您的 node.js/express 服务器在端口 8888 上运行。您的 HTML 将如下所示:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Document</title>
</head>
<body>
    <form method="post">
        Username: <input type="text" name="username" id="username">&nbsp;&nbsp;
        <button id="clickMe" type="button" value="clickme" onclick="getImages(this.form.username.value)">
        Scrape Account Images</button>
    </form>

    <div id="scrapedimages"></div>
    <script>
        let imgArray

        const getImages = (username) => {
            var xhttp = new XMLHttpRequest();
            xhttp.onreadystatechange = function () {
                if (this.readyState == 4 && this.status == 200) {
                    document.querySelector('#scrapedimages').innerHTML = ''
                    imgArray = JSON.parse(this.responseText)
                    if ( imgArray.images.length > 0 ) {
                        imgArray.images.split(',').forEach( function (source) {
                            var image = document.createElement('img')
                            image.src = source
                            document.querySelector('#scrapedimages').appendChild(image)
                        })
                    }
                }
            };
            xhttp.open('GET', 'http://127.0.0.1:8888/instascraper/user/' + username, true);
            xhttp.send();
        }
    </script>
</body>
</html>

Then in your node.js/server your script will be like this然后在您的 node.js/server 中,您的脚本将是这样的

const puppeteer = require('puppeteer')
const fs = require('fs-extra')
const express = require('express')
const app = express()
const port = 8888

const username = 'usernameInstaGram'
const password = 'passwordInstaGram'

;(async () => {

    app.get('/instascraper/user/:userID', async (request, response) => {
        const profile = request.params.userID
        const content = await scrapeImages (profile)
        response.set({
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Credentials': true,
            'Access-Control-Allow-Methods': 'POST, GET, PUT, DELETE, OPTIONS',
            'Access-Control-Allow-Headers': 'Content-Type',
            'Content-Type': 'text/plain'
        })

        response.send(content)
    })

    app.listen(port, () => {
        console.log(`Instascraper server listening on port ${port}!`)
    })

    const scrapeImages = async profile => {

        const browser = await puppeteer.launch()
        const [page] = await browser.pages()

        await page.goto('https://www.instagram.com/accounts/login/', {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('[name=username]', {timeout: 0})
        await page.type('[name=username]', username)
        await page.waitForSelector('[name=password]', {timeout: 0})
        await page.type('[name=password]',password)

        await Promise.all([
            page.waitForNavigation(),
            page.click('[type=submit]')
        ])

        await page.waitForSelector('input[placeholder="Search"]', {timeout: 0})
        await page.goto(`https://www.instagram.com/${profile}`, {waitUntil: 'networkidle0', timeout: 0})

        await page.waitForSelector('body section > main > div > header ~ div ~ div > article a[href] img[srcset]', {visible:true, timeout: 0})

        const data = await page.evaluate( () => {
            const images = document.querySelectorAll('body section > main > div > header ~ div ~ div > article a[href] img[srcset]')
            const urls = Array.from(images).map(img => img.src )
            return urls;
        })

        await browser.close()

        return `{
            "images" : "${data}"
        }`
    }

})()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 HTML 无法进入节点 js puppeteer - HTML not get in node js puppeteer 您如何从带有节点puppeteer的页面获取所有链接? - How do you get all the links from a page with node puppeteer? 如何从后端节点 js 获取 Json 数据到前端 html - How to get Json data from backend node js to frontend html 使用Puppeteer Node库,如果有19个怎么获取表数据<div id="text_translate"><p>我的目标是使用 Puppeteer 节点库获取<table>元素的 textContent。 页面上有 19 个<table>元素。 它们都具有相同的.class名称并且没有唯一的#id 。</p><pre> const tableCount = await page.$$eval('table', (tables) => tables.length); console.log(tableCount) 19</pre></div>页面上没有唯一标识符的元素?<table> </table> - Using Puppeteer Node library, how to get table data if there are 19 <table> elements with no unique identifiers on the page? 如何使用 Puppeteer 抓取 reddit 页面? - How to use Puppeteer to scrape a reddit page? 如何将 setTimeout 与 puppeteer page.evaluate 一起使用 - How to use setTimeout with puppeteer page.evaluate 如何在 Puppeteer 中使用 return 和 page.goto? - How use return with page.goto in Puppeteer? Puppeteer - 如何使用 page.evaluateHandle - Puppeteer - how to use page.evaluateHandle 我如何使用 puppeteer 获取活动的浏览器页面? - How i get active browser page with puppeteer? 如何在前端HTML页面上呈现JSON - How to render JSON on a frontend html page
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM