[英]Nested Beautiful Soup classes
I am trying to fetch all classes (including the data inside "data_from", "data_to") from the following structure:我正在尝试从以下结构中获取所有类(包括“data_from”、“data_to”中的数据):
<div class="alldata">
<div class="data_from">
<div class="data_to">
<div class="data_to">
<div class="data_from">
</div>
So far I have tried finding all classes, without success.到目前为止,我已经尝试找到所有课程,但没有成功。 The "data_from", "data_to" classes are not being fetched by:
“data_from”、“data_to”类未被获取:
soup.find_all(class_=True)
When I try to illiterate over "alldata" class I fetch only the first "data_from" class.当我试图对“alldata”类进行文盲时,我只获取第一个“data_from”类。
for data in soup.findAll('div', attrs={"class": "alldata"}):
print(data.prettify())
All assistance is greatly appreciated.非常感谢所有帮助。 Thank you.
谢谢你。
In newer code avoid old syntax findAll()
or a mix with new syntax - instead use find_all()
only - For more take a minute to check docs在较新的代码中,请避免使用旧语法
findAll()
或与新语法混合使用 - 而是仅使用find_all()
- 如需更多信息,请花一分钟时间查看文档
Your HTML is not valid, but to get your goal with valid HTML you could use css selectors
that selects all <div>
with a class that are contained in your outer <div>
:您的 HTML 无效,但要使用有效的 HTML 实现目标,您可以使用
css selectors
来选择所有<div>
以及包含在外部<div>
中的类:
soup.select('.alldata div[class]')
from bs4 import BeautifulSoup
html='''<div class="alldata">
<div class="data_from"></div>
<div class="data_to"></div>
<div class="data_to"></div>
<div class="data_from"></div>
</div>'''
soup = BeautifulSoup(html)
soup.select('.alldata div[class]')
[<div class="data_from"></div>,
<div class="data_to"></div>,
<div class="data_to"></div>,
<div class="data_from"></div>]
Just in addition if you like to get its texts, iterate over your ResultSet
:另外,如果您想获取其文本,请遍历您的
ResultSet
:
for e in soup.select('.alldata div[class]'):
print(e.text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.