简体   繁体   中英

How can I get the innerText of Dynamic Html tags using Puppeteer.js (node.js) in TripAdvisor?

How would I get all 10 comments located in this page with a loop or a Puppeteer function https://www.tripadvisor.com/Restaurant_Review-g294308-d3937445-Reviews-Maki-Quito_Pichincha_Province.html using innerText property?

The only solution I have come up with is getting the outerHTML of the whole container of comments and then try to substring to get all the comments, but that is not optimal and I think its a more difficult approach. Maybe there is an easier solution in Puppeteer I cant find?

I am doing this for educational purposes. The comments are in class="partial_entry" and I want to get the innerText of a Dynamic Html tag (I want all 10), like the ones you see here:

在此处输入图片说明

If I where to open the div that contains <div class="review-container" data-reviewid="606551292" data-collapsed="true" data-deferred="false"><!--trkN:3--> , I would get another with id="review_582693262" . Getting to the point, If I get to a <div> that has class="partial_entry" this would be where my comment is located. I have tried a few things but I get null, because it is not found since the parent <div> for each comment has a unique id like id="review_xxxxxxxxx" .

Its kind of difficult since the review id is autogenerated like id="review_xxxxxxxxx" and cant iterate with a loop copying the CSS path since I dont have a static parent .

Why not just select those elements which have partial_entry class? This works:

let comments = await page.evaluate(() =>
    [...document.querySelectorAll(".partial_entry")].map(item => item.textContent)
);

这是如何工作的

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM