Puppeteer 获取所有<a>href链接</a>

Question

Hello I am trying to scrape a web page and return all of the links inside example of the html element:您好，我正在尝试抓取web 页面并返回 html 元素示例中的所有链接：

<a href="#/item/2sDSXbG">
<a href="#/item/4ssaSXbG">
<a href="#/item/Sawd432">

Here is my code:这是我的代码：

let links = [];
let elements2 = document.querySelectorAll('a');
  for (var element2 of elements2)
  links.push(element2.textContent);

After I return the value and print it I get an Error telling me that my variable is not defined My Error:在我返回值并打印它之后，我得到一个错误，告诉我我的变量没有定义我的错误：

UnhandledPromiseRejectionWarning: ReferenceError: links is not defined

End Goal: My goal is to be able to be able to create an array of all the items in the list.最终目标：我的目标是能够创建列表中所有项目的数组。 I would than later parse the information so that it is just the text after /item/我稍后会解析信息，使其只是 /item/ 之后的文本

Answer 1

It seems this is what you need to achieve your goal with puppeteer:看来这是您使用 puppeteer 实现目标所需要的：

const hrefs = await page.evaluate(() => {
  let links = [];
  let elements2 = document.querySelectorAll('a');
  for (let element2 of elements2)
    links.push(element2.href);
  return links;
});

Answer 2

With $$eval:使用 $$eval：

let hrefs = await page.$$eval('a', as => as.map(a => a.href))

Answer 3

The anchors doesn't have any content.锚点没有任何内容。 You need something like this你需要这样的东西

<a href="#/item/2sDSXbG">content1</a>
<a href="#/item/4ssaSXbG">content2</a>
<a href="#/item/Sawd432">content3</a>

Puppeteer 获取所有<a>href链接</a>

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-06-12 06:55:13

解决方案2
1 2021-06-13 01:27:26

解决方案3
0 2021-06-12 04:21:10

Puppeteer 获取所有<a>href链接</a>

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-06-12 06:55:13

解决方案2 1 2021-06-13 01:27:26

解决方案3 0 2021-06-12 04:21:10

解决方案1
1 已采纳 2021-06-12 06:55:13

解决方案2
1 2021-06-13 01:27:26

解决方案3
0 2021-06-12 04:21:10