简体   繁体   中英

Get value from nested JavaScript object in CasperJS

I'm trying to dig into a nested javascript array to grab the first instance of an object. Here's the code:

var utils = require('utils');
var casper = require('casper').create();

casper.start('http://en.wikipedia.org/wiki/List_of_male_tennis_players', function() {
  this.echo(this.getTitle());

  // Get info on all elements matching this CSS selector
  var tennis_info_text = this.evaluate(function() {
    var nodes = document.querySelectorAll('table.sortable.wikitable tbody tr');
    return [].map.call(nodes, function(node) { // Alternatively: return Array.prototype.map.call(...
      return node.textContent;
    });
  });

  // Split the array into an array of object literals
  var tennis_data = tennis_info_text.map(function(str) {
    var elements = str.split("\n");
    var data = {
      name       : elements[1],
      birth      : elements[2],
      death      : elements[3],
      country    : elements[4]
    };
    return data;
  });

  // Dump the tennis_names array to screen
  utils.dump(tennis_data.slice(1,5));
});

casper.run();

The result of stdout is this:

{
    "name": "Acasuso, JoséJosé Acasuso",
    "birth": "1982",
    "death": "–",
    "country": " Argentina"
},
{
    "name": "Adams, DavidDavid Adams",
    "birth": "1970",
    "death": "–",
    "country": " South Africa"
},...

For the name element, I'm getting everything from the tr row, which matches 2 elements when you look at the target url source. What I want is just the second part of the name element with class "fn"; for instance: "David Adams", "José Acasuso". I'm thinking something like name:elements[1].smtg should work, but I've had no luck.

Additionally, how would I print the available object keys from the elements object?

The problem is that the first cell contains two elements which contain the name and first name of the player with different ordering. When taking the textContent of the whole cell, both name representations are put into the same string, but in the browser only one of them is visible. If you want only to access the visible one, you need to explicitly crawl it.

You could write a custom function that removes the duplicate name from the string, but it is easier to just take the correct element's textContent .

This can be easily done in the page context:

var tennis_data = this.evaluate(function() {
    var nodes = document.querySelectorAll('table.sortable.wikitable tbody tr');
    return [].map.call(nodes, function(node) {
        var cells = [].map.call(node.querySelectorAll("td"), function(cell, i){
            if (i === 0) {
                return cell.querySelector(".fn").textContent;
            } else {
                return cell.textContent;
            }
        });
        return {
            name: cells[0],
            birth: cells[1],
            ...
        }
    });
});

Additionally, how would I print the available object keys from the elements object?

elements is an array of strings so there are no keys that you can access besides the array indexes and array functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM