简体   繁体   中英

nightmarejs scrape multiple Elements with querySelectorAll

I'm trying to scrape some informations from an instagram profile page with nightmarejs (a phantomjs derivate using electron as a browser).

The goal is to get the alt tags of all images on the profile (for examples sake I focus only on the images before the "show more" button)

 var Nightmare = require('nightmare'); var nightmare = Nightmare({ show: true }); nightmare .goto('https://www.instagram.com/ackerfestival/') .evaluate(function () { let array = [...document.querySelectorAll('._icyx7')]; return array.length; }) .end() .then(function (result) { console.log(result); }) .catch(function (error) { console.error('Search failed:', error); }); 

This example works, the array has a length of 12. The electron browser opens and closes, so everything is fine. But if I change the return to just the array, the electron browser never closes and I don't get a console.log.

What am I doing wrong? I want to get all informations from the images in an Array or Object.

The problem you're hitting is document.querySelectorAll() returns a NodeList of DOMElement s. Those two object types do not serialize well, and the return value from .evaluate() has to serialize across the IPC boundary - I'm betting you're getting an empty array on the other side of your .evaluate() call?

The easiest answer here is to map out what, specifically, you want from the NodeList . From the hip, something like the following should get the idea across:

.evaluate(function(){
  return Array.from(document.querySelectorAll('._icyx7')).map(element => element.innerText);
})
.then((innerTexts) => {
  // ... do something with the inner texts of each element
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM