简体   繁体   中英

Parsing all elements which have tag before

I have following html code:

 <div class="1"> <fieldset> <legend>AAA</legend> <div class="row">aaa</div> <div class="row">aaa</div> <div class="row">aaa</div>... </fieldset> </div> <div class="1"> <fieldset> <legend>BBB</legend> <div class="row">bbb</div> <div class="row">bbb</div> <div class="row">bbb</div>... </fieldset> </div>

I'm trying to display only the text inside all rows, where parent tag is legend BBB (in this example - bbb,bbb,bbb ).

Currently I've created the code below, but it doesn't look pretty, and I don't know how to find all rows:

bs = BeautifulSoup(request.txt, 'html.parser')
if(bs.find('legend', text='BBB')):
    value = parser.find('legend').next_element.next_element.next_element.get_text().strip()
    print(value)

Is there any simply way to do this? div class name is the same, just "legend" is variable.

Added a <legend>CCC</legend> so that you may see it scales.

html = """<div class="1">
    <fieldset>
          <legend>AAA</legend>
          <div class="row">aaa</div>
          <div class="row">aaa</div>
          <div class="row">aaa</div>
          ...
    </fieldset>
</div>

<div class="1">
    <fieldset>
          <legend>BBB</legend>
          <div class="row">bbb</div>
          <div class="row">bbb</div>
          <div class="row">bbb</div>
          ...
    </fieldset>
</div>

<div class="1">
    <fieldset>
          <legend>CCC</legend>
          <div class="row">ccc</div>
          <div class="row">ccc</div>
          <div class="row">ccc</div>
          ...
    </fieldset>
</div>"""

after_tag = bs.find("legend", text="BBB").parent    # Grabs parent div <fieldset>.
divs = after_tag.find_all("div", {"class": "row"})  # Finds all div inside parent.

for div in divs:
    print(div.text)
bbb
bbb
bbb
from bs4 import BeautifulSoup

html = """
<div class="1">
    <fieldset>
          <legend>AAA</legend>
          <div class="row">aaa</div>
          <div class="row">aaa</div>
          <div class="row">aaa</div>
          ...
    </fieldset>
</div>

<div class="1">
    <fieldset>
          <legend>BBB</legend>
          <div class="row">bbb</div>
          <div class="row">bbb</div>
          <div class="row">bbb</div>
          ...
    </fieldset>
</div>
"""

soup = BeautifulSoup(html, features='html.parser')
elements = soup.select('div > fieldset')[1]

tuple_obj = ()
for row in elements.select('div.row'):
    tuple_obj = tuple_obj + (row.text,)

print(tuple_obj)

the tuple object prints out

('bbb', 'bbb', 'bbb')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM