简体   繁体   English

node.js从外部网站抓取html数据值

[英]node.js scrape html data values from external website

As it is my first question, first I want to say hello to stackoverflow community :) 因为这是我的第一个问题,首先我想向stackoverflow社区问好:)

I've started to learn node.js recently. 我最近开始学习node.js。 I want to scrap currency values from here: https://www.dailyfx.com/forex-rates and then save them in .txt file as exercise. 我想从此处剪贴货币值: https : //www.dailyfx.com/forex-rates ,然后将其保存为练习的.txt文件。

I found cheerio.js framework and tried it. 我找到了cheerio.js框架并进行了尝试。

HTML code from that page: 该页面的HTML代码:

<tbody>
  <tr id="EURUSD" data-market-id="EURUSD" class="rates-now">
    <td>
      <span title="EURUSD">
        <a href="eur-usd">EURUSD</a>
      </span>
    </td>
    <td class="text-right rates-row-td">
      <span data-symbol="EURUSD" data-type="bid" data-value="1.19016" data-changescale="-1"></span>
    </td>
    <td class="text-right rates-row-td">
      <span data-symbol="EURUSD" data-type="ask" data-value="1.21016" data-changescale="-1"></span>
    </td>
    <td class="text-right rates-row-td">
      <span id="EURUSD-spread">0.60</span>
    </td>
    <td class="text-right rates-row-td">
      <span class="calendar-toggle-btn"></span>
    </td>
</tbody>

My node.js code: 我的node.js代码:

var request = require('request');
var cheerio = require('cheerio');
var fs = require('fs');

request("https://www.dailyfx.com/forex-rates", function(error, response, body) {
  if(error) {
    console.log("Error: " + error);
  }
  console.log("Status code: " + response.statusCode);

  var $ = cheerio.load(body);

  $('tr.rates-row').each(function( index ) {
    var title = $(this).attr('data-market-id');
    console.log("Title: " + title);
    var value = $(this).find('td.rates-row-td > span').attr('data-value');
    console.log(" Value= " + value);
    fs.appendFileSync('stara.txt', value + '\n');
  });
});

Output is like: 输出如下:

Status code: 200
Title: EURUSD
 Value= undefined
Title: USDJPY
 Value= undefined
Title: AUDUSD
 Value= undefined
Title: GBPUSD
 Value= undefined
Title: USDCAD
 Value= undefined

and so on. 等等。 I don't know why values of attribute data-value are undefined. 我不知道为什么属性data-value的值未定义。

The content is being dynamically inserted with javascript. 正在使用javascript动态插入内容。 Cheerio can only read HTML, so it will always be undefined. Cheerio只能读取HTML,因此它将始终是未定义的。

You would either need to use something like Puppeteer, jSDom, phantomjs, etc. 您可能需要使用诸如Puppeteer,jsm,phantomjs等之类的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM