简体   繁体   中英

Nested Beautiful Soup classes

I am trying to fetch all classes (including the data inside "data_from", "data_to") from the following structure:

<div class="alldata">
  <div class="data_from">
  <div class="data_to">
  <div class="data_to">
  <div class="data_from">
</div>

So far I have tried finding all classes, without success. The "data_from", "data_to" classes are not being fetched by:

soup.find_all(class_=True)

When I try to illiterate over "alldata" class I fetch only the first "data_from" class.

for data in soup.findAll('div', attrs={"class": "alldata"}):
    print(data.prettify())

All assistance is greatly appreciated. Thank you.

In newer code avoid old syntax findAll() or a mix with new syntax - instead use find_all() only - For more take a minute to check docs


Your HTML is not valid, but to get your goal with valid HTML you could use css selectors that selects all <div> with a class that are contained in your outer <div> :

soup.select('.alldata div[class]')
Example
from bs4 import BeautifulSoup

html='''<div class="alldata">
  <div class="data_from"></div>
  <div class="data_to"></div>
  <div class="data_to"></div>
  <div class="data_from"></div>
</div>'''

soup = BeautifulSoup(html)

soup.select('.alldata div[class]')
Output
[<div class="data_from"></div>,
 <div class="data_to"></div>,
 <div class="data_to"></div>,
 <div class="data_from"></div>]

Just in addition if you like to get its texts, iterate over your ResultSet :

for e in soup.select('.alldata div[class]'):
    print(e.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM