如何在 html 源代码中提取 href 属性

Question

This is HTML source code that I am dealing with:这是我正在处理的 HTML 源代码：

<a href="/people/charles-adams" class="gridlist__link">

So what I want to do is to extract the href attribute, in this case would be "/people/charles-adams", with beautifulsoup module.所以我想要做的是提取 href 属性，在这种情况下是“/people/charles-adams”，带有 beautifulsoup 模块。 I need this because I want to get html source code with soup.findAll method for that particular webpage.我需要这个，因为我想获得 html 源代码与该特定网页的 soup.findAll 方法。 But I am struggling to extract such attribute from the webpage.但我正在努力从网页中提取此类属性。 Could anyone help me with this problem?谁能帮我解决这个问题？

PS I am using this method to get html source code with Python module beautifulSoup: PS我正在使用这种方法来获取带有Python模块beautifulSoup的html源代码：

request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')

Answer 1

Try something like:尝试类似：

refs = soup.find_all('a')
for i in refs:
    if i.has_attr('href'):
        print(i['href'])

It should output:它应该是 output：

/people/charles-adams

Answer 2

You can tell beautifulsoup to find all anchor tags with soup.find_all('a') .您可以告诉beautifulsoup使用soup.find_all('a')查找所有锚标签。 Then you can filter it with list comprehension and get the links.然后您可以使用列表理解对其进行过滤并获取链接。

request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')

tags = soup.find_all('a')
tags = [tag for tag in tags if tag.has_attr('href')]
links = [tag['href'] for tag in tags]

links will be ['/people/charles-adams'] links将是['/people/charles-adams']

如何在 html 源代码中提取 href 属性

问题描述

2 个解决方案

解决方案1
0 2019-09-23 00:16:57

解决方案2
0 2019-09-23 00:23:05

如何在 html 源代码中提取 href 属性

问题描述

2 个解决方案

解决方案1 0 2019-09-23 00:16:57

解决方案2 0 2019-09-23 00:23:05

解决方案1
0 2019-09-23 00:16:57

解决方案2
0 2019-09-23 00:23:05