简体   繁体   中英

Node.js web-scraping

I'm trying to scrape some code, to get a link, and some text from a paragraph. But for some reason my code dosen't work, i have tried alot, and every time, it just gives me undifined.

var request = require('request');
var cheerio = require('cheerio');

request('https://bitskins.com', function (error, response, html) {
  if (!error && response.statusCode == 200) {
    var $ = cheerio.load(html);
    $('p', '.chat-box-content').each(function(i, element){
        if($(this).attr('style') == 'height: 15px;'){
            console.log($(this));
        }
    });
  }
});

https://gyazo.com/b80465474a389657c44aeeb64888a006

I only wan it to return the second and the third line, so the link and the price, but do i have to do? I'm new and i lost.

The problem is that when you request the page, the chat box is a collapsed/hidden state, and all the <p> links (which are apparently placeholders) are empty. If open the chat box, some JavaScript on the page runs and populates the list.

Fortunately you don't need the scrape the screen at all. The page invokes an API to populate the list. You can just call the API yourself.

var request = require('request');

request.post('https://bitskins.com/api/v1/get_last_chat_messages', function (error, response, data) {
  if (!error && response.statusCode == 200) {
      var dataObject = JSON.parse(data);
      dataObject.data.messages.forEach(function (message) {
          // For some reason the message is JSON encoded as a string...
          var messageObject = JSON.parse(message);
          // The message object has "message" field.
          // Just use a regex to parse out the link and the price.
          var link = messageObject.message.match(/href='([^']+)/)[1];
          var price = messageObject.message.match(/\$(\d+\.\d+)/)[1];
          console.log(link + " " + price);
      });
  }
});

You probably will want to add better error-handling, convert the price into a number, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM