How do I make a HTTP request (for scraping purposes) to an Angular2 site?

Question

I'm trying to use a node server to scrape some information from an agular2 Application. The problem is that the response I get is the index.js file, essentially the "loading..." part of the page.

I'm using the npm request or request-promise package like this:

var rp = require("request-promise");

rp('https://someurl.com')
    .then((html) => {
        // Do something with the response
    })
    .catch((err) => {
        console.log(err);
    })

But I can't figure out if it is possible to wait for the page to actually load. I've looked at possibly using Angular Universal but I need to get the data after it has all loaded and the site owner is against using Universal.

Is there anyway to make this work?

Answer 1

First of all you need to get a fully rendered page. Unfortunately, JS rendered web pages can't be reached without rendering process, but we can go through this process using the "headless" browsers, like PhantomJS.

“A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers”

Here I found a good example, which can suite you to move on: https://www.r-bloggers.com/web-scraping-javascript-rendered-sites/

Also, you could check this article, about SEO for Angularjs powered sites, under the "Spitting out the HTML Pages" you can find useful information: https://www.yearofmoo.com/2012/11/angularjs-and-seo.html#sptting-out-the-html-pages

How do I make a HTTP request (for scraping purposes) to an Angular2 site?

Question

1 answers

solution1
0 ACCPTED 2017-04-03 20:50:27

How do I make a HTTP request (for scraping purposes) to an Angular2 site?

Question

1 answers

solution1 0 ACCPTED 2017-04-03 20:50:27

solution1
0 ACCPTED 2017-04-03 20:50:27