[英]How to extract href value from rel tag in Python
...html...
<link rel="image_src" href="image.jpg" />
....more html....
How to extract the image url using BeautifulSoup in Python如何使用 Python 中的 BeautifulSoup 提取图像 url
Try this:尝试这个:
links = soup.find_all("link", {"rel": True})
for link in links:
print(link.attrs["href"])
Use find()
or find_all()
for more items使用
find()
或find_all()
获取更多项目
for item in soup.find_all('link'):
print(item['href'])`
You can also use {href': True}
to make sure that link will have href
.您还可以使用
{href': True}
来确保链接将具有href
。 And {'rel': 'image_src'}
to make sure that it is link with image.和
{'rel': 'image_src'}
以确保它与图像链接。
for item in soup.find_all('link', {'href': True, 'rel': 'image_src'}):
print(item['href'])`
Minimal working example最小的工作示例
from bs4 import BeautifulSoup as BS
text = '''
<link rel="image_src" />
<link rel="image_src" href="image1.jpg" />
<link rel="sound_src" href="hello.mp3" />
<link rel="image_src" href="image2.jpg" />
'''
soup = BS(text, 'html.parser')
for item in soup.find_all('link', {'href': True, 'rel': "image_src"}):
print(item['href'])
Try Css selector this.试试 Css 选择器这个。
soup.select_one('[rel="image_src"]')['href']
OR或者
soup.select_one('link[rel="image_src"]')['href']
For multiple items.对于多个项目。
for item in soup.select('[rel="image_src"]'):
print(item['href'])
If soup
is the BeautifulSoup
object, then use如果
soup
是BeautifulSoup
object,然后使用
hrefs = [link['href'] for link in soup.find_all('link') if link.get('href') is not None]
Beware that there might not be a href
attribute, and in that case, link['href']
will raise KeyError
.请注意,可能没有
href
属性,在这种情况下, link['href']
将引发KeyError
。 This is why I used link.get('href')
to check existence.这就是我使用
link.get('href')
来检查存在的原因。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.