简体   繁体   English

如何在 html 源代码中提取 href 属性

[英]How to extract href attribute in html source code

This is HTML source code that I am dealing with:这是我正在处理的 HTML 源代码:

<a href="/people/charles-adams" class="gridlist__link">

So what I want to do is to extract the href attribute, in this case would be "/people/charles-adams", with beautifulsoup module.所以我想要做的是提取 href 属性,在这种情况下是“/people/charles-adams”,带有 beautifulsoup 模块。 I need this because I want to get html source code with soup.findAll method for that particular webpage.我需要这个,因为我想获得 html 源代码与该特定网页的 soup.findAll 方法。 But I am struggling to extract such attribute from the webpage.但我正在努力从网页中提取此类属性。 Could anyone help me with this problem?谁能帮我解决这个问题?

PS I am using this method to get html source code with Python module beautifulSoup: PS我正在使用这种方法来获取带有Python模块beautifulSoup的html源代码:

request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')

Try something like:尝试类似:

refs = soup.find_all('a')
for i in refs:
    if i.has_attr('href'):
        print(i['href'])

It should output:它应该是 output:

/people/charles-adams

You can tell beautifulsoup to find all anchor tags with soup.find_all('a') .您可以告诉beautifulsoup使用soup.find_all('a')查找所有锚标签。 Then you can filter it with list comprehension and get the links.然后您可以使用列表理解对其进行过滤并获取链接。

request = requests.get(link, headers=header)
html = request.text
soup = BeautifulSoup(html, 'html.parser')

tags = soup.find_all('a')
tags = [tag for tag in tags if tag.has_attr('href')]
links = [tag['href'] for tag in tags]

links will be ['/people/charles-adams'] links将是['/people/charles-adams']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM