简体   繁体   English

如何使用 BeautifulSoup 从 html 标签中获取“href”

[英]How to get 'href' from a html tag using BeautifulSoup

I am trying to extract an image link from a table, and have gotten to the point of the "td" tag, but can't get the link inside of it.我正在尝试从表中提取图像链接,并且已经到了“td”标记的点,但无法获取其中的链接。 Here is my code:这是我的代码:

from bs4 import BeautifulSoup
import requests


def get_html(url):
    r = requests.get(url)
    r.encoding = 'utf8'
    return r.text


data = '''
<td class="cover" valign="top">
<a href="/upload/iblock/ea7/ea72966465cde6ae6674321dcd95d1af.jpg" rel="lightbox"><img alt="Пьесы" src="/upload/iblock/ea7/ea72966465cde6ae6674321dcd95d1af.jpg" title="Пьесы"/></a>
</td>
'''


def get_dt(html):
    soup = BeautifulSoup(html, 'lxml')
    a = soup.findAll('table')[1].findAll('tr')
    for tr in range(len(a)):
        b = a[tr].findAll('td')
        for td in range(len(b)):
            if tr == 0 and td == 0:
                c = b[td]
                print(c.get('href'))


def get_dt2(html):
    soup = BeautifulSoup(html, 'lxml')
    print(soup.get('href'))


# link = 'http://www.rech-deti.ru/catalog/7/61021/'
get_dt2(data)

I keep getting the output:我不断得到输出:

None

or if i use或者如果我使用

soup['href']

I get:我得到:

Traceback (most recent call last):
  File "C:/Users/Vlad/PycharmProjects/Ultimate_Parser/Rech/rech table test.py", line 42, in <module>
    get_dt2(data)
  File "C:/Users/Vlad/PycharmProjects/Ultimate_Parser/Rech/rech table test.py", line 38, in get_dt2
    print(soup['href'])
  File "C:\Users\Vlad\PycharmProjects\Ultimate_Parser\venv\lib\site-packages\bs4\element.py", line 1401, in __getitem__
    return self.attrs[key]
KeyError: 'href'

I have tried using the answers from this question: Get item from bs4.element.Tag but, neither one of them worked.我试过使用这个问题的答案: Get item from bs4.element.Tag但是,他们都没有工作。

Try this to get all the a elements that contain an href attribute:试试这个来获取所有包含href属性的a元素:

def get_dt2(html):
    soup = BeautifulSoup(html, 'lxml')
    for a in soup.find_all('a', href=True):
        print (a['href'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM