[英]Node Js & Puppeteer - How to select text wrapped inside an Anchor tag
I'm working on a project at the moment, have run into an error and need your help!我现在正在做一个项目,遇到了一个错误,需要你的帮助!
Basically, I am trying to select the wrapped text inside the following anchor tag基本上,我正在尝试 select 以下锚标签内的包装文本
<a href="..." class="productDetailsLink js-productName">Product Name</a>
This is my current code:这是我当前的代码:
await page.waitForSelector('div > div > div > div > div > a[class = "productDetailsLink js-productName"')
.then(() => page.evaluate(() => {
const itemArray = [];
const itemNodeList = document.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"');
itemNodeList.forEach(item => {
const itemTitle = item.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"').innerText;
console.log(itemTitle);
})
} ))
However, I'm not getting any luck.但是,我没有运气。 I've run out of ideas on how to scrape such text.我已经没有关于如何抓取此类文本的想法了。
Not sure how Puppeteer works but I've had great success using cheerio
( https://www.npmjs.com/package/cheerio ) for parsing scraped html with phantom
.不确定 Puppeteer 是如何工作的,但我在使用cheerio
( https://www.npmjs.com/package/cheerio )解析刮擦的 html 和phantom
方面取得了巨大成功。
I think you can use puppeteer like phatom for scraping and use cheerio on the scraped HTML content like this below:我认为您可以使用像 phatom 这样的 puppeteer 进行刮擦,并在刮擦的 HTML 内容上使用cheerio,如下所示:
const cheerio = require('cherio');
const $ = cheerio.load(content); // content is your HTML scraped
result = $('. productDetailsLink').text();
If those class attributes are unique to that particular anchor <a href="..." class="productDetailsLink js-productName">Product Name</a>
, Following method could be used:如果这些 class 属性对于该特定锚<a href="..." class="productDetailsLink js-productName">Product Name</a>
是唯一的,则可以使用以下方法:
await page.evaluate(() => {
let anchorText = document.querySelector('a.productDetailsLink.js-productName').innerHTML;
console.info("anchorText::", anchorText);
});
/*OR another way*/
await page.$eval('a.productDetailsLink.js-productName', e => e.innerHTML);
If there are a list of anchors:如果有锚列表:
await page.evaluate(() => {
let anchorList = document.querySelectorAll('a.productDetailsLink.js-productName');
anchorList.forEach(e => {
let anchorText = e.innerHTML;
console.info("anchorText::", anchorText);
});
});
.innerText worked for me (not.text or.innerHTML) .innerText 为我工作(不是 .text 或 .innerHTML)
Credit: saw it here: https://learnscraping.com/nodejs-web-scraping-with-puppeteer/信用:在这里看到它: https://learnscraping.com/nodejs-web-scraping-with-puppeteer/
for the selector: choose to Inspect and Copy -> JS path.对于选择器:选择 Inspect and Copy -> JS path。
below I copied the JS Path of the "Advanced help" link here:下面我在这里复制了“高级帮助”链接的 JS 路径:
document.querySelector("#mdhelp-tabs > li.float-right > a") document.querySelector("#mdhelp-tabs > li.float-right > a")
Yes, it comes with "document.querySelector" and all ready to paste in the puppeteer Node.js code是的,它带有“document.querySelector”并准备好粘贴到 puppeteer Node.js 代码中
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.