简体   繁体   English

如何从 Python 中的 rel 标记中提取 href 值

[英]How to extract href value from rel tag in Python

...html...
<link rel="image_src" href="image.jpg" />
....more html....

How to extract the image url using BeautifulSoup in Python如何使用 Python 中的 BeautifulSoup 提取图像 url

Try this:尝试这个:

links = soup.find_all("link", {"rel": True})
for link in links:
    print(link.attrs["href"])

Use find() or find_all() for more items使用find()find_all()获取更多项目

for item in soup.find_all('link'): 
     print(item['href'])`

You can also use {href': True} to make sure that link will have href .您还可以使用{href': True}来确保链接将具有href And {'rel': 'image_src'} to make sure that it is link with image.{'rel': 'image_src'}以确保它与图像链接。

for item in soup.find_all('link', {'href': True, 'rel': 'image_src'}): 
     print(item['href'])`

Minimal working example最小的工作示例

from bs4 import BeautifulSoup as BS

text = '''
    <link rel="image_src" />
    <link rel="image_src" href="image1.jpg" />
    <link rel="sound_src" href="hello.mp3" />
    <link rel="image_src" href="image2.jpg" />
'''

soup = BS(text, 'html.parser')

for item in soup.find_all('link', {'href': True, 'rel': "image_src"}):
    print(item['href'])

Try Css selector this.试试 Css 选择器这个。

soup.select_one('[rel="image_src"]')['href']

OR或者

 soup.select_one('link[rel="image_src"]')['href']

For multiple items.对于多个项目。

for item in soup.select('[rel="image_src"]'):
    print(item['href'])

If soup is the BeautifulSoup object, then use如果soupBeautifulSoup object,然后使用

hrefs = [link['href'] for link in soup.find_all('link') if link.get('href') is not None]

Beware that there might not be a href attribute, and in that case, link['href'] will raise KeyError .请注意,可能没有href属性,在这种情况下, link['href']将引发KeyError This is why I used link.get('href') to check existence.这就是我使用link.get('href')来检查存在的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM