I am looking for a way to efficiently scrape information formatted in the following way using puppeteer. Suppose I have a list of things on a website divided as such:
<div id="list">
<div class="item" pos="0">
<a href="www.somewebsite.com">
<div class="nameToRetrieve"> Name 1 </div>
</div>
<div class="item" pos="1">
<a href="www.somewebsite.com">
<div class="nameToRetrieve"> Name 2 </div>
</div>
<div class="item" pos="2">
<a href="www.somewebsite.com">
<div class="nameToRetrieve"> Name 3 </div>
</div>
</div>
How can I retrieve the information of the names (Name 1, Name 2 and Name 3?
I have tried fitting them into an object to make then into an array, but I am still confused as to how to approach it.
const listOfStuff = document.getElementById('list').getElementsByClassName('itemResult')
Not much to do with the puppeteer
API I think. On modern browsers (ES6) converting to an array is elegant, and then just map it. Note I assumed nameToRetrieve
only appears in stuff you want to retrieve, so no need to get the "list"
.
var names = Array.from(document.getElementsByClassName("nameToRetrieve")).map(x => x.innerHTML); console.log(names)
<div id="list"> <div class="item" pos="0"> <a href="www.somewebsite.com"> <div class="nameToRetrieve"> Name 1 </div> </div> <div class="item" pos="1"> <a href="www.somewebsite.com"> <div class="nameToRetrieve"> Name 2 </div> </div> <div class="item" pos="2"> <a href="www.somewebsite.com"> <div class="nameToRetrieve"> Name 3 </div> </div> </div>
There is a special convenience method page.$$eval
for this task in puppeteer:
let result = await page.$$eval('.nameToRetrieve', names => names.map(name => name.textContent));
console.log(result);
This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.
The result will be:
[ ' Name 1 ', ' Name 2 ', ' Name 3 ' ]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.