简体   繁体   中英

Web Scraper with nodejs and cheerio?

Hi guys im really stuck here so close to an solution that it hurts :/ Im trying to create an web scraper script.

So far I have:

  • Server setup at digitalOcean
  • Working script
  • Successfully download html response text

But im stuck trying to get elements. Here is my working code up to this point:

var http = require('http');
var request = require('request');
var cheerio = require('cheerio');

http.createServer(function (req, res) {
   request('http://www.xscores.com/soccer', function   (error, response, 
html) {
   if (!error && response.statusCode == 200) {
       var $ = cheerio.load(html);
       res.writeHead(200, { 'Content-Type':'text/plain'});
       res.end('html:'+html);

   }
 }); }).listen(8080); console.log('Server is running at 
http://178.62.253.206:8080/');

This is still Wip progress and I have not setup any database yet so the overall plan is to load all of this info into a tables or div elements on my server page.

I wonder How I could loop trough the elements at xscores for class "score_home_txt score_cell wrap" where home team is located and get this displayed at my server ?

its built up like this:

<div class="score_teams  score_cell">
<div class="score_home score_cell">
<div class="score_home_txt score_cell wrap">
TRACTOR SAZI
</div>

Im used to doing this process with excel VBA and doing this with cheerio is a quite a new experience.

Any help at all would be much appreciated

Frederik

This is how you can loop through to display names:

var http = require('http');
var request = require('request');
var cheerio = require('cheerio');

http.createServer(function (req, res) {
    request('http://www.xscores.com/soccer', function (error, response,
        html) {
        if (!error && response.statusCode == 200) {
            var $ = cheerio.load(html);
            var list_items = "";
            $('div.score_home_txt.score_cell.wrap').each(function (i, element) {
                var a = $(this).text();
                list_items += "<li>" + a + "</li>";
                console.log(a);
            });
            var html = "<ul>" + list_items + "</ul>"
            res.writeHead(200, {
                'Content-Type': 'text/html'
            });
            res.end(html);
        }
    });
}).listen(8080);
console.log('Server is running at http://178.62.253.206:8080/');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM