Node Js & Puppeteer - 如何将 select 文本包裹在 Anchor 标签内

Question

I'm working on a project at the moment, have run into an error and need your help!我现在正在做一个项目，遇到了一个错误，需要你的帮助！

Basically, I am trying to select the wrapped text inside the following anchor tag基本上，我正在尝试 select 以下锚标签内的包装文本

<a href="..." class="productDetailsLink js-productName">Product Name</a>

This is my current code:这是我当前的代码：

 await page.waitForSelector('div > div > div > div > div > a[class = "productDetailsLink js-productName"')
        .then(() => page.evaluate(() => {
            const itemArray = [];
            const itemNodeList = document.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"');
            

            itemNodeList.forEach(item => {
                const itemTitle = item.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"').innerText;
                console.log(itemTitle);
            })
        } ))

However, I'm not getting any luck.但是，我没有运气。 I've run out of ideas on how to scrape such text.我已经没有关于如何抓取此类文本的想法了。

Answer 1

Not sure how Puppeteer works but I've had great success using cheerio ( https://www.npmjs.com/package/cheerio ) for parsing scraped html with phantom .不确定 Puppeteer 是如何工作的，但我在使用cheerio （ https://www.npmjs.com/package/cheerio ）解析刮擦的 html 和phantom方面取得了巨大成功。

I think you can use puppeteer like phatom for scraping and use cheerio on the scraped HTML content like this below:我认为您可以使用像 phatom 这样的 puppeteer 进行刮擦，并在刮擦的 HTML 内容上使用cheerio，如下所示：

const cheerio = require('cherio');
const $ = cheerio.load(content); // content is your HTML scraped
result = $('. productDetailsLink').text();

Answer 2

If those class attributes are unique to that particular anchor <a href="..." class="productDetailsLink js-productName">Product Name</a> , Following method could be used:如果这些 class 属性对于该特定锚<a href="..." class="productDetailsLink js-productName">Product Name</a>是唯一的，则可以使用以下方法：

await page.evaluate(() => {
 let anchorText = document.querySelector('a.productDetailsLink.js-productName').innerHTML;
 console.info("anchorText::", anchorText);
});

/*OR another way*/
await page.$eval('a.productDetailsLink.js-productName', e => e.innerHTML);

If there are a list of anchors:如果有锚列表：

await page.evaluate(() => {
 let anchorList = document.querySelectorAll('a.productDetailsLink.js-productName');
 anchorList.forEach(e => {
  let anchorText = e.innerHTML;
  console.info("anchorText::", anchorText);
 });
});

Answer 3

.innerText worked for me (not.text or.innerHTML) .innerText 为我工作（不是 .text 或 .innerHTML）

Credit: saw it here: https://learnscraping.com/nodejs-web-scraping-with-puppeteer/信用：在这里看到它： https://learnscraping.com/nodejs-web-scraping-with-puppeteer/

for the selector: choose to Inspect and Copy -> JS path.对于选择器：选择 Inspect and Copy -> JS path。

below I copied the JS Path of the "Advanced help" link here:下面我在这里复制了“高级帮助”链接的 JS 路径：

document.querySelector("#mdhelp-tabs > li.float-right > a") document.querySelector("#mdhelp-tabs > li.float-right > a")

Yes, it comes with "document.querySelector" and all ready to paste in the puppeteer Node.js code是的，它带有“document.querySelector”并准备好粘贴到 puppeteer Node.js 代码中

Node Js & Puppeteer - 如何将 select 文本包裹在 Anchor 标签内

问题描述

3 个解决方案

解决方案1
1 2020-04-10 14:27:38

解决方案2
1 已采纳 2020-04-10 15:56:17

解决方案3
0 2021-03-24 10:24:23

Node Js &amp; Puppeteer - 如何将 select 文本包裹在 Anchor 标签内

问题描述

3 个解决方案

解决方案1 1 2020-04-10 14:27:38

解决方案2 1 已采纳 2020-04-10 15:56:17

解决方案3 0 2021-03-24 10:24:23

Node Js & Puppeteer - 如何将 select 文本包裹在 Anchor 标签内

解决方案1
1 2020-04-10 14:27:38

解决方案2
1 已采纳 2020-04-10 15:56:17

解决方案3
0 2021-03-24 10:24:23