Consider the following hierarchy in DOM
<div class="bodyCells">
<div style="foo">
<div style="foo">
<div style="foo1"> 'contains the list of text elements I want to scrape' </div>
<div style="foo2"> 'contains the list of text elements I want to scrape' </div>
</div>
<div style="foo">
<div style="foo3"> 'contains the list of text elements I want to scrape' </div>
<div style="foo4"> 'contains the list of text elements I want to scrape' </div>
</div>
By using class name bodyCells , I need to scrape out the data from each of the divs one at a time (ie) Initially from 1st div, then from the next div and so on and store it in separate arrays. How can I possibly achieve this? (using puppeteer)
NOTE: I have tried using class name directly to achieve this but, it gives all the texts in a single array. I need to get data from each tag separately in different arrays.
Expected output:
array1=["text present within style="foo1" div tag"]
array2=["text present within style="foo2" div tag"]
array3=["text present within style="foo3" div tag"]
array4=["text present within style="foo4" div tag"]
As you noted, you can fetch each of the texts in a single array using the class name. Next, if you iterate over each of those, you can create a separate array for each subsection.
I created a fiddle here - https://jsfiddle.net/32bnoey6/ - with this example code:
const cells = document.getElementsByClassName('bodyCells');
const scrapedElements = [];
for (var i = 0; i < cells.length; i++) {
const item = cells[i];
for (var j = 0; j < item.children.length; j++) {
const outerDiv = item.children[j];
const innerDivs = outerDiv.children;
for (var k = 0; k < innerDivs.length; k++) {
const targetDiv = innerDivs[k];
scrapedElements.push([targetDiv.innerHTML]);
}
}
}
console.log(scrapedElements);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.