I'm trying to select the 2nd column of the following html with beautiful soup
<div class="parent">
<div class="column">
<div class="inventory">1</div>
<div class="inventory">2</div>
<div class="inventory">3</div>
</div>
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
<div class="column">
<div class="inventory">7</div>
<div class="inventory">8</div>
<div class="inventory">9</div>
</div>
</div>
I'm using the css idiom div.column + div
to select the 2nd column. However the below iterates over the rows in both the 2nd and 3rd column. I believe the logic div.column + div
isn't doing what I expect it to.
soup = BeautifulSoup(htmlSource)
secondColumn = soup.select('div.column + div div.inventory')
for row in column:
#prints stuff about the row
Is there any way I can only iterate over the rows of the 2nd column?
The resultset is entirely correct for the given CSS; the third div
follows a div with the column
class too (the second div has that class, after all).
You'll have to find all column
divs and just pick out the second one from that result set:
soup.select("div > div.column")[1]
This'll only give you the one column, even if there are more such groups elsewhere in the document.
If you need the second column per parent , add a loop:
for parent in soup.select('div.parent'):
column = parent.select('div.column')[1]
Demo:
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="parent">
... <div class="column">
... <div class="inventory">1</div>
... <div class="inventory">2</div>
... <div class="inventory">3</div>
... </div>
... <div class="column">
... <div class="inventory">4</div>
... <div class="inventory">5</div>
... <div class="inventory">6</div>
... </div>
... <div class="column">
... <div class="inventory">7</div>
... <div class="inventory">8</div>
... <div class="inventory">9</div>
... </div>
... </div>
... ''')
>>> soup.select("div.parent > div.column")[1]
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
>>> for parent in soup.select('div.parent'):
... column = parent.select('div.column')[1]
... print column
...
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
BeautifulSoup
supports css classes directly:
for parent in soup.find_all('div', 'parent'):
second_column = parent('div', 'column')[1]
# handle the second column
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.