简体   繁体   中英

node.js scrape html data values from external website

As it is my first question, first I want to say hello to stackoverflow community :)

I've started to learn node.js recently. I want to scrap currency values from here: https://www.dailyfx.com/forex-rates and then save them in .txt file as exercise.

I found cheerio.js framework and tried it.

HTML code from that page:

<tbody>
  <tr id="EURUSD" data-market-id="EURUSD" class="rates-now">
    <td>
      <span title="EURUSD">
        <a href="eur-usd">EURUSD</a>
      </span>
    </td>
    <td class="text-right rates-row-td">
      <span data-symbol="EURUSD" data-type="bid" data-value="1.19016" data-changescale="-1"></span>
    </td>
    <td class="text-right rates-row-td">
      <span data-symbol="EURUSD" data-type="ask" data-value="1.21016" data-changescale="-1"></span>
    </td>
    <td class="text-right rates-row-td">
      <span id="EURUSD-spread">0.60</span>
    </td>
    <td class="text-right rates-row-td">
      <span class="calendar-toggle-btn"></span>
    </td>
</tbody>

My node.js code:

var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');

request("https://www.dailyfx.com/forex-rates", function(error, response, body) {
  if(error) {
    console.log("Error: " + error);
  }
  console.log("Status code: " + response.statusCode);

  var $ = cheerio.load(body);

  $('tr.rates-row').each(function( index ) {
    var title = $(this).attr('data-market-id');
    console.log("Title: " + title);
    var value = $(this).find('td.rates-row-td > span').attr('data-value');
    console.log(" Value= " + value);
    fs.appendFileSync('stara.txt', value + '\n');
  });
});

Output is like:

Status code: 200
Title: EURUSD
 Value= undefined
Title: USDJPY
 Value= undefined
Title: AUDUSD
 Value= undefined
Title: GBPUSD
 Value= undefined
Title: USDCAD
 Value= undefined

and so on. I don't know why values of attribute data-value are undefined.

The content is being dynamically inserted with javascript. Cheerio can only read HTML, so it will always be undefined.

You would either need to use something like Puppeteer, jSDom, phantomjs, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM