[英]Beautiful Soup trouble getting nested HTML
Using Beautiful soup, I can't seem to capture ALL of the HTML elements. 使用Beautiful汤,我似乎无法捕获所有HTML元素。 Specifically, I aim to try to get the
4
value in 具体来说,我的目标是试图让
4
的值
<button class="css-812ha7 " type="button">4</button>
but I am having trouble doing so using beautiful soup because I can't capture the nested tags 但我无法使用漂亮的汤来这样做,因为我无法捕获嵌套标签
Code: 码:
soup.select('.css-rs2cuv')
Returns: 返回值:
[
<div class="css-rs2cuv">
<button class="css-rzdbbc" type="button">
<svg class="css-1jc5boz" viewbox="0 95 57 95">
<path d="M57 142.5L9.5"></path>
</svg>
</button>
<button class="css-rzdbbc" type="button">
<svg class="css-15yx468" viewbox="0 95 57 95">
<path d="M57 142.5L9.5 95 0 104.5l38"></path>
</svg>
</button>
</div>
]
I thought my line of code would return all the tags and nested tags and then I can just run more methods to grab my desired value 我以为我的代码行将返回所有标签和嵌套标签,然后我可以运行更多方法来获取所需的值
HTML I am parsing: 我正在解析的HTML:
<div class="css-rs2cuv">
<button class="css-rzdbbc" type="button">
<svg viewBox="0 95 57 95" class="css-1jc5boz">
<path d="M57 142.5L9.5"></path>
</svg>
</button>
<button class="css-10po51q " type="button">1</button>
<button class="css-812ha7 " type="button">2</button>
<button class="css-812ha7 " type="button">3</button>
<div class="css-ufx8pa " data-comp="Flex Box">...</div>
<button class="css-812ha7 " type="button">4</button>
<button class="css-mnn3vx " type="button">
<svg viewBox="0 95 57 95" class="css-15yx468 ">
<path d="M57 142.5L9.5 95 0 104.5l38"></path>
</svg>
</button>
</div>
select returns a list of all tags matching that tag.You can use an index to this list to get your required tag and then use .text
to get the text inside. select返回与该标签匹配的所有标签的列表。您可以使用该列表的索引来获取所需的标签,然后使用
.text
来获取内部的文本。
from bs4 import BeautifulSoup
html="""
<div class="css-rs2cuv">
<button class="css-rzdbbc" type="button">
<svg viewBox="0 95 57 95" class="css-1jc5boz">
<path d="M57 142.5L9.5"></path>
</svg>
</button>
<button class="css-10po51q " type="button">1</button>
<button class="css-812ha7 " type="button">2</button>
<button class="css-812ha7 " type="button">3</button>
<div class="css-ufx8pa " data-comp="Flex Box">...</div>
<button class="css-812ha7 " type="button">4</button>
<button class="css-mnn3vx " type="button">
<svg viewBox="0 95 57 95" class="css-15yx468 ">
<path d="M57 142.5L9.5 95 0 104.5l38"></path>
</svg>
</button>
</div>
"""
soup=BeautifulSoup(html,'html.parser')
print(soup.select('.css-812ha7')[2].text)
Output 输出量
4
Not enough html to tell if you would need to use select
or select_one
( select_one
will return first match), but for html shown you can use the relationship between the attribute of the element before the desired one (by specifying an attribute=value selector of [ data-comp='Flex Box']
) , in adjacent sibling combination with the class of the element you want to grab. 没有足够的html来告诉您是否需要使用
select
或select_one
( select_one
将返回第一个匹配项),但是对于显示的html,您可以使用所需元素之前的元素属性之间的关系(通过指定attribute = value选择器 [ data-comp='Flex Box']
),与您要抓取的元素的类相邻的同级组合中。 The +
is an adjacent sibling combinator . +
是相邻的同级组合器 。
With multiple matches for this css selector combination, and where not the first, select
can be used to retrieve all matches; 对于该css选择器组合,如果有多个匹配项,并且不是第一个,则可以使用
select
检索所有匹配项; you can index into that to retrieve a specific item. 您可以将其索引以检索特定项目。
In this scenario using class name alone, as a selector, would almost certainly be quicker but worth being aware of other methods. 在这种情况下,仅使用类名作为选择器几乎可以肯定会更快,但是值得注意其他方法。
from bs4 import BeautifulSoup
html="""
<div class="css-rs2cuv">
<button class="css-rzdbbc" type="button">
<svg viewBox="0 95 57 95" class="css-1jc5boz">
<path d="M57 142.5L9.5"></path>
</svg>
</button>
<button class="css-10po51q " type="button">1</button>
<button class="css-812ha7 " type="button">2</button>
<button class="css-812ha7 " type="button">3</button>
<div class="css-ufx8pa " data-comp="Flex Box">...</div>
<button class="css-812ha7 " type="button">4</button>
<button class="css-mnn3vx " type="button">
<svg viewBox="0 95 57 95" class="css-15yx468 ">
<path d="M57 142.5L9.5 95 0 104.5l38"></path>
</svg>
</button>
</div>
"""
soup = BeautifulSoup(html,'lxml')
print(soup.select_one("[data-comp='Flex Box'] + .css-812ha7").text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.