Alright, so the page I'm trying to scrape with node.js puppeteer is structured like this
<html lang = "en">
....
<html xmlns="https://www.w3.org/1999/xhtml" lang="en">
<a href = "link I'm trying to go to">Go to link</a>
</html>
</html>
I tried to click by selector and XPath. Neither worked, and I triple checked that both were right. I feel like it has something to do with this embedded html, and I don't know how to handle it? Can anyone help?
Other comments pointed out that content inside an iframe are not accessible from the parent document. I checked the code again, and turns out it was actually structured like this:
<html lang = "en">
....
<iframe src = "url">
<html xmlns="https://www.w3.org/1999/xhtml" lang="en">
<a href = "link I'm trying to go to">Go to link</a>
</html>
</iframe>
</html>
So all I had to do was page.goto(url), and then I could scrape as normal. Thanks everyone!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.