简体   繁体   中英

soup.select css nth of type?

I'm trying to select the 2nd column of the following html with beautiful soup

<div class="parent">
  <div class="column">
      <div class="inventory">1</div>
      <div class="inventory">2</div>
      <div class="inventory">3</div>
  </div>
  <div class="column">
      <div class="inventory">4</div>
      <div class="inventory">5</div>
      <div class="inventory">6</div>
  </div>
  <div class="column">
      <div class="inventory">7</div>
      <div class="inventory">8</div>
      <div class="inventory">9</div>
  </div>
</div>

I'm using the css idiom div.column + div to select the 2nd column. However the below iterates over the rows in both the 2nd and 3rd column. I believe the logic div.column + div isn't doing what I expect it to.

soup = BeautifulSoup(htmlSource)
secondColumn = soup.select('div.column + div div.inventory')
for row in column:
    #prints stuff about the row

Is there any way I can only iterate over the rows of the 2nd column?

The resultset is entirely correct for the given CSS; the third div follows a div with the column class too (the second div has that class, after all).

You'll have to find all column divs and just pick out the second one from that result set:

soup.select("div > div.column")[1]

This'll only give you the one column, even if there are more such groups elsewhere in the document.

If you need the second column per parent , add a loop:

for parent in soup.select('div.parent'):
    column = parent.select('div.column')[1]

Demo:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="parent">
...   <div class="column">
...       <div class="inventory">1</div>
...       <div class="inventory">2</div>
...       <div class="inventory">3</div>
...   </div>
...   <div class="column">
...       <div class="inventory">4</div>
...       <div class="inventory">5</div>
...       <div class="inventory">6</div>
...   </div>
...   <div class="column">
...       <div class="inventory">7</div>
...       <div class="inventory">8</div>
...       <div class="inventory">9</div>
...   </div>
... </div>
... ''')
>>> soup.select("div.parent > div.column")[1]
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
>>> for parent in soup.select('div.parent'):
...     column = parent.select('div.column')[1]
...     print column
... 
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>

BeautifulSoup supports css classes directly:

for parent in soup.find_all('div', 'parent'):
    second_column = parent('div', 'column')[1]
    # handle the second column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM