简体   繁体   English

用soup.select在美丽的汤中选择第二个孩子?

[英]selecting second child in beautiful soup with soup.select?

I have:我有:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

now what's the easiest way to get the Peter here if I have h2 tag already?如果我已经有了 h2 标签,那么现在让 Peter 在这里的最简单方法是什么? Now I've tried:现在我试过了:

soup.select("#names > p:nth-child(1)")

but here I get nth-child NotImplementedError:但在这里我得到了第 n 个孩子 NotImplementedError:

NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

So I'm not sure what's going on here.所以我不确定这里发生了什么。 The second option was to just get all 'p' tag children and hard select [1] but then there's a danger of index out of range which would require to surround every attempt to get Peter with try/except which is a bit silly.第二种选择是只获取所有 'p' 标签子项并硬选择 [1],但是存在索引超出范围的危险,这需要围绕每次尝试使用 try/except 获取Peter ,这有点愚蠢。

Any way to select nth-child with soup.select() function?有什么办法可以用soup.select()函数选择第n个孩子?

EDIT: replacing nth-child with nth-of-type seemed to do the trick, so the correct line is:编辑:用 nth-of-type 替换 nth-child 似乎可以解决问题,所以正确的行是:

soup.select("#names > p:nth-of-type(1)")

not sure why it doesn't accept nth-child but it seems that both nth-child and nth-of-type return the same results.不知道为什么它不接受 nth-child 但似乎 nth-child 和 nth-of-type 都返回相同的结果。

Adding your edit as an answer so that it can be more easily found by others:将您的编辑添加为答案,以便其他人更容易找到它:

Use nth-of-type instead of nth-child :使用nth-of-type而不是nth-child

soup.select("#names > p:nth-of-type(1)")

'nth-of-child' is simply not implemented in beautifulsoup4 (at time of writing), there is simply no code in the beautifulsoup codebase to do it. 'nth-of-child' 根本没有在 beautifulsoup4 中实现(在撰写本文时),beautifulsoup 代码库中根本没有代码可以做到这一点。 The authors explicitly added the 'NotImplementedError' to explain this, here is the code作者明确添加了“NotImplementedError”来解释这一点, 这是代码

Given the html you quote in your question you are not looking for a child of h2#names.鉴于您在问题中引用的 html,您不是在寻找 h2#names 的孩子。

What you are really looking for is the second adjacent sibling, I'm not a css selector guru but I found that this worked.您真正要寻找的是第二个相邻的兄弟姐妹,我不是 css 选择器大师,但我发现这行得通。

soup.select("#names + p + p")

Beautiful Soup 4.7.0 (released at the beginning of 2019) now supports most selectors, including :nth-child : Beautiful Soup 4.7.0(2019 年初发布) 现在支持大多数选择器,包括:nth-child

As of version 4.7.0, Beautiful Soup supports most CSS4 selectors via the SoupSieve project.从 4.7.0 版本开始,Beautiful Soup 通过 SoupSieve 项目支持大多数 CSS4 选择器。 If you installed Beautiful Soup through pip , SoupSieve was installed at the same time, so you don't have to do anything extra.如果您通过pip安装了 Beautiful Soup,则同时安装了 SoupSieve,因此您无需执行任何额外操作。

So, if you upgrade your version:所以,如果你升级你的版本:

pip install bs4 -U

You'll be able to use nearly all selectors you'd ever need to, including nth-child .您将能够使用几乎所有您需要的选择器,包括nth-child

That said, note that in your input HTML, the #names h2 tag does not actually have any children:也就是说,请注意,在您的输入 HTML 中, #names h2标签实际上没有任何子项:

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

Here, there are just 3 elements, which are all siblings, so这里只有 3 个元素,它们都是兄弟元素,所以

#names > p:nth-child(1)

wouldn't work, even in CSS or Javascript.即使在 CSS 或 Javascript 中也不起作用。

If the #names element had the <p> s as children , your selector would work, to an extent:如果#names元素将<p>作为元素,则您的选择器在一定程度上可以工作:

html = '''
<div id='names'>
    <p>John</p>
    <p>Peter</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names > p:nth-child(1)")

Output:输出:

[<p>John</p>]

Of course, the John <p> is the first child of the #names parent.当然, John <p>#names父级的第一个子#names If you want Peter , use :nth-child(2) .如果您想要Peter ,请使用:nth-child(2)

If the elements are all adjacent siblings, you can use + to select the next sibling:如果元素都是相邻的兄弟元素,则可以使用+选择下一个兄弟元素:

html = '''
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names + p + p")

Output:输出:

[<p>Peter</p>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM