[英]How to get 'href' from a html tag using BeautifulSoup
I am trying to extract an image link from a table, and have gotten to the point of the "td" tag, but can't get the link inside of it.我正在尝试从表中提取图像链接,并且已经到了“td”标记的点,但无法获取其中的链接。 Here is my code:这是我的代码:
from bs4 import BeautifulSoup
import requests
def get_html(url):
r = requests.get(url)
r.encoding = 'utf8'
return r.text
data = '''
<td class="cover" valign="top">
<a href="/upload/iblock/ea7/ea72966465cde6ae6674321dcd95d1af.jpg" rel="lightbox"><img alt="Пьесы" src="/upload/iblock/ea7/ea72966465cde6ae6674321dcd95d1af.jpg" title="Пьесы"/></a>
</td>
'''
def get_dt(html):
soup = BeautifulSoup(html, 'lxml')
a = soup.findAll('table')[1].findAll('tr')
for tr in range(len(a)):
b = a[tr].findAll('td')
for td in range(len(b)):
if tr == 0 and td == 0:
c = b[td]
print(c.get('href'))
def get_dt2(html):
soup = BeautifulSoup(html, 'lxml')
print(soup.get('href'))
# link = 'http://www.rech-deti.ru/catalog/7/61021/'
get_dt2(data)
I keep getting the output:我不断得到输出:
None
or if i use或者如果我使用
soup['href']
I get:我得到:
Traceback (most recent call last):
File "C:/Users/Vlad/PycharmProjects/Ultimate_Parser/Rech/rech table test.py", line 42, in <module>
get_dt2(data)
File "C:/Users/Vlad/PycharmProjects/Ultimate_Parser/Rech/rech table test.py", line 38, in get_dt2
print(soup['href'])
File "C:\Users\Vlad\PycharmProjects\Ultimate_Parser\venv\lib\site-packages\bs4\element.py", line 1401, in __getitem__
return self.attrs[key]
KeyError: 'href'
I have tried using the answers from this question: Get item from bs4.element.Tag but, neither one of them worked.我试过使用这个问题的答案: Get item from bs4.element.Tag但是,他们都没有工作。
Try this to get all the a elements that contain an href attribute:试试这个来获取所有包含href属性的a元素:
def get_dt2(html):
soup = BeautifulSoup(html, 'lxml')
for a in soup.find_all('a', href=True):
print (a['href'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.