简体   繁体   中英

How do I make a HTTP request (for scraping purposes) to an Angular2 site?

I'm trying to use a node server to scrape some information from an agular2 Application. The problem is that the response I get is the index.js file, essentially the "loading..." part of the page.

I'm using the npm request or request-promise package like this:

var rp = require("request-promise");

rp('https://someurl.com')
    .then((html) => {
        // Do something with the response
    })
    .catch((err) => {
        console.log(err);
    })

But I can't figure out if it is possible to wait for the page to actually load. I've looked at possibly using Angular Universal but I need to get the data after it has all loaded and the site owner is against using Universal.

Is there anyway to make this work?

First of all you need to get a fully rendered page. Unfortunately, JS rendered web pages can't be reached without rendering process, but we can go through this process using the "headless" browsers, like PhantomJS.

“A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers”

Here I found a good example, which can suite you to move on: https://www.r-bloggers.com/web-scraping-javascript-rendered-sites/

Also, you could check this article, about SEO for Angularjs powered sites, under the "Spitting out the HTML Pages" you can find useful information: https://www.yearofmoo.com/2012/11/angularjs-and-seo.html#sptting-out-the-html-pages

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM