[英]Select a tag inside a class with bs4
I'm trying to get the href of this part of html:我正在尝试获取这部分 html 的 href:
<h3 class="post-title entry-title" itemprop="name">
<a href="http://sslproxies24.blogspot.it/2016/10/01-10-16-free-ssl-proxies-1070.html">01-10-16 | Free SSL Proxies (1070)</a>
</h3>
So I created this script:所以我创建了这个脚本:
import urllib.request
from bs4 import BeautifulSoup
url = "http://sslproxies24.blogspot.it/"
soup = BeautifulSoup(urllib.request.urlopen(url))
for tag in soup.find_all("h3", "post-title entry-title"):
links = tag.get("href")
But links, doesn't find anything.但是链接,没有找到任何东西。 This is because, the class "post-title entry-title" that I selected with bs4, has not attribute "href"...
这是因为,我用 bs4 选择的“post-title entry-title”类没有属性“href”......
In fact the output of:事实上输出:
print (tag.attrs)
is:是:
{'itemprop': 'name', 'class': ['post-title', 'entry-title']}
How can I do to select the "a" element and get the links in href?如何选择“a”元素并获取 href 中的链接?
You can quickly solve it by getting the inner a
element:您可以通过获取内部
a
元素来快速解决它:
for tag in soup.find_all("h3", "post-title entry-title"):
link = tag.a.get("href")
where tag.a
is a shortcut to tag.find("a")
.其中
tag.a
是tag.find("a")
的快捷方式。
Or, you can match the a
element directly with a CSS selector :或者,您可以将
a
元素直接与CSS 选择器匹配:
for a in soup.select("h3.post-title.entry-title > a"):
link = a.get("href")
where dot is a class attribute selector, >
means direct parent-child relationship .其中 dot 是类属性选择器,
>
表示直接父子关系。
Or, you can check itemprop
attribute instead of a class:或者,您可以检查
itemprop
属性而不是类:
for a in soup.select("h3[itemprop=name] > a"):
link = a.get("href")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.