简体   繁体   中英

selecting second child in beautiful soup with soup.select?

I have:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

now what's the easiest way to get the Peter here if I have h2 tag already? Now I've tried:

soup.select("#names > p:nth-child(1)")

but here I get nth-child NotImplementedError:

NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

So I'm not sure what's going on here. The second option was to just get all 'p' tag children and hard select [1] but then there's a danger of index out of range which would require to surround every attempt to get Peter with try/except which is a bit silly.

Any way to select nth-child with soup.select() function?

EDIT: replacing nth-child with nth-of-type seemed to do the trick, so the correct line is:

soup.select("#names > p:nth-of-type(1)")

not sure why it doesn't accept nth-child but it seems that both nth-child and nth-of-type return the same results.

Adding your edit as an answer so that it can be more easily found by others:

Use nth-of-type instead of nth-child :

soup.select("#names > p:nth-of-type(1)")

'nth-of-child' is simply not implemented in beautifulsoup4 (at time of writing), there is simply no code in the beautifulsoup codebase to do it. The authors explicitly added the 'NotImplementedError' to explain this, here is the code

Given the html you quote in your question you are not looking for a child of h2#names.

What you are really looking for is the second adjacent sibling, I'm not a css selector guru but I found that this worked.

soup.select("#names + p + p")

Beautiful Soup 4.7.0 (released at the beginning of 2019) now supports most selectors, including :nth-child :

As of version 4.7.0, Beautiful Soup supports most CSS4 selectors via the SoupSieve project. If you installed Beautiful Soup through pip , SoupSieve was installed at the same time, so you don't have to do anything extra.

So, if you upgrade your version:

pip install bs4 -U

You'll be able to use nearly all selectors you'd ever need to, including nth-child .

That said, note that in your input HTML, the #names h2 tag does not actually have any children:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

Here, there are just 3 elements, which are all siblings, so

#names > p:nth-child(1)

wouldn't work, even in CSS or Javascript.

If the #names element had the <p> s as children , your selector would work, to an extent:

html = '''
<div id='names'>
    <p>John</p>
    <p>Peter</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names > p:nth-child(1)")

Output:

[<p>John</p>]

Of course, the John <p> is the first child of the #names parent. If you want Peter , use :nth-child(2) .

If the elements are all adjacent siblings, you can use + to select the next sibling:

html = '''
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names + p + p")

Output:

[<p>Peter</p>]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM