[英]Targeting <a> with specific attribute using BeautifulSoup
I'm attempting to scrape a page that has a section like this: 我正在尝试抓取具有以下内容的页面:
<a name="id_631"></a>
<hr>
<div class="store-class">
<div>
<span><strong>Store City</strong</span>
</div>
<div class="store-class-content">
<p>Event listing</p>
<p>Event listing2</p>
<p>Event listing3</p>
</div>
<div>
Stuff about contact info
</div>
</div>
The page is a list of sections like that and the only way to differentiate them is by the name attribute in the <a>
tag. 该页面是类似部分的列表,唯一区分它们的方法是通过<a>
标记中的name属性。
So I'm thinking I want to target that then go to the next_sibling to get the <hr>
then again to the next sibling to get the <div class="store-class">
section. 所以我在想我要定位到该目标,然后转到next_sibling以获取<hr>
然后再次到达下一个同胞以获取<div class="store-class">
部分。 All I want is the info in that div tag. 我想要的只是该div标签中的信息。
I'm not sure how to target that <a>
tag to move down two siblings though. 我不确定如何将<a>
标记作为目标向下移动两个同级标记。 When I try print(soup.find_all('a', {"name":"id_631"}))
that just gives me what's in the tag, which is nothing. 当我尝试print(soup.find_all('a', {"name":"id_631"}))
,只会给我标记中的内容,什么也没有。
Here's my script: 这是我的脚本:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.tandyleather.com/en/leathercraft-classes")
soup = soup = BeautifulSoup(r.text, 'html.parser')
print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))
But I get the error: 但是我得到了错误:
Traceback (most recent call last):
File "tandy.py", line 8, in <module>
print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'
find_next_sibling()
to the rescue: find_next_sibling()
进行救援:
soup.find("a", attrs={"name": "id_631"}).find_next_sibling("div", class_="store-class")
Also, html.parser
has to replaced with either lxml
or html5lib
. 另外, html.parser
必须替换为lxml
或html5lib
。
See also: 也可以看看:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.