简体   繁体   English

Web 抓取不同的网站并将值推送到 object

[英]Web scraping different websites and pushing values to object

I'm trying to loop through different websites and scrape their values and push it to a global variable.我正在尝试遍历不同的网站并抓取它们的值并将其推送到全局变量。 I've tried different things but I seem unable to push val to dat.我尝试了不同的方法,但我似乎无法将 val 推向 dat。 My goal is to have an object with the stock values of DAL and AAL.我的目标是拥有一个具有 DAL 和 AAL 股票价值的 object。

var request = require("request"),
cheerio = require("cheerio");

var ticker = ["DAL", "AAL"];
var dat = []

for (var i = 0; i < ticker.length; i++) {
  var url = "https://finance.yahoo.com/quote/"+ ticker[i] + "?p=" + ticker[i];
    request(url, function (error, response, body) {
      if (!error) {
        var $ = cheerio.load(body);
        var val = {
          Ticker : ticker[i],
          "Date" : new Date(),
          PreviousClose : $("span[data-reactid='98']").text().toString(),
          Open : $("span[data-reactid='103']").text().toString(),
          Bid : $("span[data-reactid='108']").text().toString(),
          Ask : $("span[data-reactid='113']").text().toString(),
          DayRange : $("td[data-reactid='117']").text().toString(),
          WeekRange_52 : $("td[data-reactid='121']").text().toString(),
          Volume : $("span[data-reactid='126']").text().toString(),
          AverageVolume : $("span[data-reactid='131']").text().toString(),
          MarketCap : $("span[data-reactid='139']").text().toString(),
          Beta5Months : $("span[data-reactid='144']").text().toString(),
          PEratio : $("span[data-reactid='149']").text().toString(),
          "EPS" : $("span[data-reactid='154']").text().toString()
        };
      } else {
        return console.error(error);
      }
    });
    dat.push(val);
}

console.log(dat);

EDIT:编辑:

var request = require("request"),
cheerio = require("cheerio");

var ticker = ["DAL", "AAL"];
var dat = []

for (var i = 0; i < ticker.length; i++) {
  var url = "https://finance.yahoo.com/quote/"+ ticker[i] + "?p=" + ticker[i];
    request(url, function (error, response, body) {
      if (!error) {
        var $ = cheerio.load(body);
        var val = {
          Ticker : ticker[i],
          "Date" : new Date(),
          PreviousClose : $("span[data-reactid='98']").text().toString(),
          Open : $("span[data-reactid='103']").text().toString(),
          Bid : $("span[data-reactid='108']").text().toString(),
          Ask : $("span[data-reactid='113']").text().toString(),
          DayRange : $("td[data-reactid='117']").text().toString(),
          WeekRange_52 : $("td[data-reactid='121']").text().toString(),
          Volume : $("span[data-reactid='126']").text().toString(),
          AverageVolume : $("span[data-reactid='131']").text().toString(),
          MarketCap : $("span[data-reactid='139']").text().toString(),
          Beta5Months : $("span[data-reactid='144']").text().toString(),
          PEratio : $("span[data-reactid='149']").text().toString(),
          "EPS" : $("span[data-reactid='154']").text().toString()
        }; 
        dat.push(val); // moved the push call to inside this if statement so it's in the same "scope" as the var variable
      } else {
        console.error(error);
      }
    });
}

console.log(dat);

ORIGINAL:原来的:

The problem is val is being limited to the scope of the if statement you create it in. (Read more about it here )问题是val仅限于创建它的if语句的 scope。( 在此处阅读更多信息)

Try adding var val;尝试添加var val; above the request call (assuming request is a synchronous function, read more about asynchronous vs. synchronous here ).request调用之上(假设request是同步的 function,请在此处阅读有关异步与同步的更多信息)。

So it'd look something like this:所以它看起来像这样:

// .... keep the code up here
  var url = "https://finance.yahoo.com/quote/"+ ticker[i] + "?p=" + ticker[I];
    var val;
    request(url, function (error, response, body) {
      if (!error) {
        var $ = cheerio.load(body);
        val = {
// ... keep your code from down here

Basically the val variable is essentially "destroyed" (to put it simply) after that if statement completes, so dat.push doesn't have access to it.基本上val变量在if语句完成之后基本上被“销毁”(简单地说),所以dat.push无权访问它。

The code could be improved by running the requests concurrently using promises , but to answer the question as asked, do the work of pushing val inside the callback where it is defined (the OP placement of that line won't parse).可以通过使用 promises 同时运行请求来改进代码,但要按要求回答问题,请在定义它的回调内执行推送val的工作(该行的 OP 位置不会解析)。 Similarly, there's no point in logging the accumulated data until the last callback invocation.同样,在最后一次回调调用之前记录累积的数据是没有意义的。

for (var i = 0; i < ticker.length; i++) {
  var url = "https://finance.yahoo.com/quote/"+ ticker[i] + "?p=" + ticker[i];
    request(url, function (error, response, body) {
      if (!error) {
        var $ = cheerio.load(body);
        var val = {
          Ticker : ticker[i],
          "Date" : new Date(),
          PreviousClose : $("span[data-reactid='98']").text().toString(),
          Open : $("span[data-reactid='103']").text().toString(),
          Bid : $("span[data-reactid='108']").text().toString(),
          Ask : $("span[data-reactid='113']").text().toString(),
          DayRange : $("td[data-reactid='117']").text().toString(),
          WeekRange_52 : $("td[data-reactid='121']").text().toString(),
          Volume : $("span[data-reactid='126']").text().toString(),
          AverageVolume : $("span[data-reactid='131']").text().toString(),
          MarketCap : $("span[data-reactid='139']").text().toString(),
          Beta5Months : $("span[data-reactid='144']").text().toString(),
          PEratio : $("span[data-reactid='149']").text().toString(),
          "EPS" : $("span[data-reactid='154']").text().toString()
        };
        // relocated OP cumulating and logging code here:
        dat.push(val);
        if (i === ticker.length-1) {
          console.log(dat);
        }
      } else {
        return console.error(error);
      }
    });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM