简体   繁体   English

Beautiful Soup 从同一个 div 类中的第二个孩子刮文本

[英]Beautiful Soup scrape text from second child in same div class

I have two <tr> tags within the same div class.我在同一个 div 类中有两个<tr>标签。 The first tr tag prints the text just fine.第一个 tr 标签可以很好地打印文本。 I am trying to access the second tr tag within the container that I have but I cant seem to get it to work.我正在尝试访问我拥有的容器内的第二个 tr 标签,但我似乎无法让它工作。 Also please note, not all containers have a second <tr> tag so I need an if statement to check if it exists first.另请注意,并非所有容器都有第二个<tr>标记,因此我需要一个if语句来首先检查它是否存在。 Then if it does, print the text from it.然后,如果是,则从中打印文本。 Thanks!谢谢!

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div",{"class":"right"})

for container in containers:
    print(container.span.text)
    print(container.tr.text)

    if container.nextSiblings('tr')[1]:
        print(container.nextSiblings('tr')[1].text)

You can locate all the tr elements within a container and check how many of them you have:您可以找到容器中的所有tr元素并检查您拥有多少个元素:

for container in containers:
    trs = container("tr")  # same as container.find_all("tr")

    if len(trs) > 1:
         print(trs[1].get_text())

You can also directly locate the second tr within every container in a single CSS selector :您还可以在单​​个CSS 选择器中直接定位每个容器中的第二个tr

for tr in soup.select(".right > tr:nth-of-type(2)"):
    print(tr.get_text())

Demo:演示:

from bs4 import BeautifulSoup


data = """
<body>
    <div class="right">
        <tr>container 1 row 1</tr>
        <tr>container 1 row 2</tr>
    </div>
    <div class="right">
        <tr>container 2 row 1</tr>
    </div>
    <div class="right">
        <tr>container 3 row 1</tr>
        <tr>container 3 row 2</tr>
        <tr>container 3 row 3</tr>
    </div>
</body>
"""

soup = BeautifulSoup(data, "html.parser")
for tr in soup.select(".right > tr:nth-of-type(2)"):
    print(tr.get_text())

would print:会打印:

container 1 row 2
container 3 row 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM