简体   繁体   中英

Return Javascript variable data from external page using NodeJS

I am trying to send a request to a page and grab the entire DOM. Basically a crawl. On this website, there is a variable loaded directly into the HTML (not a script file) with some data. Using my NodeJS backend, which I use request with, how would I request this page and return the variable's data? Here's an example:

http://some-page.com/index.html

<html>
    <head>
        <script>
            var my_var = {
                title: "Good title",
                description: "Nice description",
                page: 5
            };
        </script>
    </head>
</html>

If I visit the website, open the console and type in my_var I can see the content in the console, so it's a global variable.

How can I do something like this? I can use another request library if that is needed.

You are looking for jsdom: https://github.com/tmpvar/jsdom

const dom = new JSDOM(`<body>
  <script>document.body.appendChild(document.createElement("hr"));</script>
</body>`, { runScripts: "dangerously" });

// The script will be executed and modify the DOM:
dom.window.document.body.children.length === 2;

it also come with a Virtual Console

Virtual consoles

Like web browsers, jsdom has the concept of a "console". This records both information directly sent from the page, via scripts executing inside the document, as well as information from the jsdom implementation itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM