简体   繁体   English

美丽的汤麻烦嵌套HTML

[英]Beautiful Soup trouble getting nested HTML

Using Beautiful soup, I can't seem to capture ALL of the HTML elements. 使用Beautiful汤,我似乎无法捕获所有HTML元素。 Specifically, I aim to try to get the 4 value in 具体来说,我的目标是试图让4的值

    <button class="css-812ha7 " type="button">4</button>

but I am having trouble doing so using beautiful soup because I can't capture the nested tags 但我无法使用漂亮的汤来这样做,因为我无法捕获嵌套标签

Code: 码:

soup.select('.css-rs2cuv')

Returns: 返回值:

[
     <div class="css-rs2cuv">
         <button class="css-rzdbbc" type="button">
             <svg class="css-1jc5boz" viewbox="0 95 57 95">
                 <path d="M57 142.5L9.5"></path>
             </svg>
         </button>
         <button class="css-rzdbbc" type="button">
             <svg class="css-15yx468" viewbox="0 95 57 95">
                 <path d="M57 142.5L9.5 95 0 104.5l38"></path>
             </svg>
         </button>
     </div>
]

I thought my line of code would return all the tags and nested tags and then I can just run more methods to grab my desired value 我以为我的代码行将返回所有标签和嵌套标签,然后我可以运行更多方法来获取所需的值

HTML I am parsing: 我正在解析的HTML:

<div class="css-rs2cuv">
    <button class="css-rzdbbc" type="button">
        <svg viewBox="0 95 57 95" class="css-1jc5boz">
             <path d="M57 142.5L9.5"></path>
        </svg>
    </button>
    <button class="css-10po51q " type="button">1</button>
    <button class="css-812ha7 " type="button">2</button>
    <button class="css-812ha7 " type="button">3</button>
    <div class="css-ufx8pa " data-comp="Flex Box">...</div>
    <button class="css-812ha7 " type="button">4</button>
    <button class="css-mnn3vx " type="button">
        <svg viewBox="0 95 57 95" class="css-15yx468 ">
            <path d="M57 142.5L9.5 95 0 104.5l38"></path>
        </svg>
    </button>
</div>

select returns a list of all tags matching that tag.You can use an index to this list to get your required tag and then use .text to get the text inside. select返回与该标签匹配的所有标签的列表。您可以使用该列表的索引来获取所需的标签,然后使用.text来获取内部的文本。

from bs4 import BeautifulSoup
html="""
<div class="css-rs2cuv">
    <button class="css-rzdbbc" type="button">
        <svg viewBox="0 95 57 95" class="css-1jc5boz">
             <path d="M57 142.5L9.5"></path>
        </svg>
    </button>
    <button class="css-10po51q " type="button">1</button>
    <button class="css-812ha7 " type="button">2</button>
    <button class="css-812ha7 " type="button">3</button>
    <div class="css-ufx8pa " data-comp="Flex Box">...</div>
    <button class="css-812ha7 " type="button">4</button>
    <button class="css-mnn3vx " type="button">
        <svg viewBox="0 95 57 95" class="css-15yx468 ">
            <path d="M57 142.5L9.5 95 0 104.5l38"></path>
        </svg>
    </button>
</div>
"""
soup=BeautifulSoup(html,'html.parser')
print(soup.select('.css-812ha7')[2].text)

Output 输出量

4

Not enough html to tell if you would need to use select or select_one ( select_one will return first match), but for html shown you can use the relationship between the attribute of the element before the desired one (by specifying an attribute=value selector of [ data-comp='Flex Box'] ) , in adjacent sibling combination with the class of the element you want to grab. 没有足够的html来告诉您是否需要使用selectselect_oneselect_one将返回第一个匹配项),但是对于显示的html,您可以使用所需元素之前的元素属性之间的关系(通过指定attribute = value选择器 [ data-comp='Flex Box'] ),与您要抓取的元素的类相邻的同级组合中。 The + is an adjacent sibling combinator . +相邻的同级组合器

With multiple matches for this css selector combination, and where not the first, select can be used to retrieve all matches; 对于该css选择器组合,如果有多个匹配项,并且不是第一个,则可以使用select检索所有匹配项; you can index into that to retrieve a specific item. 您可以将其索引以检索特定项目。

In this scenario using class name alone, as a selector, would almost certainly be quicker but worth being aware of other methods. 在这种情况下,仅使用类名作为选择器几乎可以肯定会更快,但是值得注意其他方法。

from bs4 import BeautifulSoup
html="""
<div class="css-rs2cuv">
    <button class="css-rzdbbc" type="button">
        <svg viewBox="0 95 57 95" class="css-1jc5boz">
             <path d="M57 142.5L9.5"></path>
        </svg>
    </button>
    <button class="css-10po51q " type="button">1</button>
    <button class="css-812ha7 " type="button">2</button>
    <button class="css-812ha7 " type="button">3</button>
    <div class="css-ufx8pa " data-comp="Flex Box">...</div>
    <button class="css-812ha7 " type="button">4</button>
    <button class="css-mnn3vx " type="button">
        <svg viewBox="0 95 57 95" class="css-15yx468 ">
            <path d="M57 142.5L9.5 95 0 104.5l38"></path>
        </svg>
    </button>
</div>
"""
soup = BeautifulSoup(html,'lxml')
print(soup.select_one("[data-comp='Flex Box'] + .css-812ha7").text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM