简体   繁体   English

soup.select css nth of type?

[英]soup.select css nth of type?

I'm trying to select the 2nd column of the following html with beautiful soup 我正在尝试用美丽的汤选择以下html的第二列

<div class="parent">
  <div class="column">
      <div class="inventory">1</div>
      <div class="inventory">2</div>
      <div class="inventory">3</div>
  </div>
  <div class="column">
      <div class="inventory">4</div>
      <div class="inventory">5</div>
      <div class="inventory">6</div>
  </div>
  <div class="column">
      <div class="inventory">7</div>
      <div class="inventory">8</div>
      <div class="inventory">9</div>
  </div>
</div>

I'm using the css idiom div.column + div to select the 2nd column. 我正在使用css idiom div.column + div来选择第二列。 However the below iterates over the rows in both the 2nd and 3rd column. 但是,下面将迭代第2列和第3列中的行。 I believe the logic div.column + div isn't doing what I expect it to. 我相信逻辑div.column + div没有按照我的预期行事。

soup = BeautifulSoup(htmlSource)
secondColumn = soup.select('div.column + div div.inventory')
for row in column:
    #prints stuff about the row

Is there any way I can only iterate over the rows of the 2nd column? 有什么办法我只能迭代第二列的行?

The resultset is entirely correct for the given CSS; 结果集对于给定的CSS完全正确; the third div follows a div with the column class too (the second div has that class, after all). 第三div遵循与一个div column (第二div有该类,毕竟)。

You'll have to find all column divs and just pick out the second one from that result set: 你必须找到所有的 column div,然后从结果集中选出第二个:

soup.select("div > div.column")[1]

This'll only give you the one column, even if there are more such groups elsewhere in the document. 这只会给你一列,即使文档中的其他地方有更多这样的组。

If you need the second column per parent , add a loop: 如果您需要每个父级的第二列,请添加一个循环:

for parent in soup.select('div.parent'):
    column = parent.select('div.column')[1]

Demo: 演示:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div class="parent">
...   <div class="column">
...       <div class="inventory">1</div>
...       <div class="inventory">2</div>
...       <div class="inventory">3</div>
...   </div>
...   <div class="column">
...       <div class="inventory">4</div>
...       <div class="inventory">5</div>
...       <div class="inventory">6</div>
...   </div>
...   <div class="column">
...       <div class="inventory">7</div>
...       <div class="inventory">8</div>
...       <div class="inventory">9</div>
...   </div>
... </div>
... ''')
>>> soup.select("div.parent > div.column")[1]
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>
>>> for parent in soup.select('div.parent'):
...     column = parent.select('div.column')[1]
...     print column
... 
<div class="column">
<div class="inventory">4</div>
<div class="inventory">5</div>
<div class="inventory">6</div>
</div>

BeautifulSoup supports css classes directly: BeautifulSoup直接支持css类:

for parent in soup.find_all('div', 'parent'):
    second_column = parent('div', 'column')[1]
    # handle the second column

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM