Web-scraping a website, that is being loaded with javascript (using javascript)

Question

I am trying to gather line-ups from football/soccer reports. I decided to web-scrape the data from a reports provider, but their websites are loaded with javascript.

To be more specific, let's take this link to a flashscores.co.uk match .

First, they restrict CORS, which means I used allorigins.me to avoid it and then I used this code:

function readurl(url, elementID){
    var url = "http://allorigins.me/get?url=" + encodeURIComponent(url) + "&callback=?";
    var xhttp = new XMLHttpRequest();
    xhttp.onreadystatechange = function() {
        if (this.readyState == 4 && this.status == 200) {
            document.getElementById(elementID).innerHTML = this.responseText;
        }
    };
    xhttp.open("GET", url, true);
    xhttp.send();
}

The result was something like this and it looks the same all the way down (still \\n and \\t, not the real content). I guess the problem is that the flashscores website is using javascript to load the data, but allorigins.me did not "wait" until the whole website was loaded. Here is another look , where it seems that is being loaded with javascript.

The desired result is to gather the starting elevens of both teams (Allonso M., Arrizabalaga K., Azpilicueta C.,...). I inspected the website and found, that every name is inside a HTML tag: <div class="name">PLAYER'S NAME HERE</div> .

Any idea how to avoid both problems at once?

CORS restriction
The delay before the web is "filled" with data from javascript

I am trying to use client-side languages (no PHP).

Thank you :)

Answer 1

There are a few problems with your question:

CORS is used to protect resources on the server side, and you need the client side resources, which are mostly public, so you do not need a way to avoid it.
The problem is not "waiting" until the page will load, the problem is you need to run these scripts yourself.

I recommend you use something like JSDom with Node.js for this task, should be quite simple.

A great blog post about web scraping with Node.js (without script execution): here

official JSDom npm page: here

Good Luck !

Web-scraping a website, that is being loaded with javascript (using javascript)

Question

1 answers

solution1
0 2018-10-07 15:21:07

Web-scraping a website, that is being loaded with javascript (using javascript)

Question

1 answers

solution1 0 2018-10-07 15:21:07

solution1
0 2018-10-07 15:21:07