简体   繁体   English

嵌套 Beautiful Soup 类

[英]Nested Beautiful Soup classes

I am trying to fetch all classes (including the data inside "data_from", "data_to") from the following structure:我正在尝试从以下结构中获取所有类(包括“data_from”、“data_to”中的数据):

<div class="alldata">
  <div class="data_from">
  <div class="data_to">
  <div class="data_to">
  <div class="data_from">
</div>

So far I have tried finding all classes, without success.到目前为止,我已经尝试找到所有课程,但没有成功。 The "data_from", "data_to" classes are not being fetched by: “data_from”、“data_to”类未被获取:

soup.find_all(class_=True)

When I try to illiterate over "alldata" class I fetch only the first "data_from" class.当我试图对“alldata”类进行文盲时,我只获取第一个“data_from”类。

for data in soup.findAll('div', attrs={"class": "alldata"}):
    print(data.prettify())

All assistance is greatly appreciated.非常感谢所有帮助。 Thank you.谢谢你。

In newer code avoid old syntax findAll() or a mix with new syntax - instead use find_all() only - For more take a minute to check docs在较新的代码中,请避免使用旧语法findAll()或与新语法混合使用 - 而是仅使用find_all() - 如需更多信息,请花一分钟时间查看文档


Your HTML is not valid, but to get your goal with valid HTML you could use css selectors that selects all <div> with a class that are contained in your outer <div> :您的 HTML 无效,但要使用有效的 HTML 实现目标,您可以使用css selectors来选择所有<div>以及包含在外部<div>中的类:

soup.select('.alldata div[class]')
Example例子
from bs4 import BeautifulSoup

html='''<div class="alldata">
  <div class="data_from"></div>
  <div class="data_to"></div>
  <div class="data_to"></div>
  <div class="data_from"></div>
</div>'''

soup = BeautifulSoup(html)

soup.select('.alldata div[class]')
Output输出
[<div class="data_from"></div>,
 <div class="data_to"></div>,
 <div class="data_to"></div>,
 <div class="data_from"></div>]

Just in addition if you like to get its texts, iterate over your ResultSet :另外,如果您想获取其文本,请遍历您的ResultSet

for e in soup.select('.alldata div[class]'):
    print(e.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM