BeautifulSoup 找不到所有內容

Question

我正在嘗試從 web 頁面中抓取一些數據，我要抓取的數據設置如下：

<div id="pagetitle">
      <a href="some_link">some_text</a>
      "some_text2"
      <a href="some_link2">some_text3</a>
</div>

我正在嘗試獲取some_text3我正在嘗試使用此代碼

soup = soup(page, "html5lib")

author = soup.find('div', {'id' : 'pagetitle'}).a.string

print(author)

當我這樣做時，我只得到some_text我也嘗試過：

author = soup.find_all('a', {'id' : 'pagetitle'})

但我得到一個空列表，我也嘗試過：

author = soup.find(id='pagetitle').prettify()

我得到了整個代碼，但我不知道如何只得到some_text3我也嘗試使用不同的解析器，但如果這很難理解，它們都沒有工作也很抱歉，但這是我的第二個問題，我會接受所有建議如果有。

Answer 1

您可以將 CSS 選擇器與:nth-last-child()一起使用。 例如：

from bs4 import BeautifulSoup


html_doc = """
<div id="pagetitle">
      <a href="some_link">some_text</a>
      "some_text2"
      <a href="some_link2">some_text3</a>
</div>"""

soup = BeautifulSoup(html_doc, "html.parser")

txt = soup.select_one("#pagetitle > a:nth-last-child(1)").text
print(txt)

印刷：

some_text3

或者：使用[-1]獲取最后一個元素：

txt = soup.select("#pagetitle a")[-1].text
print(txt)

BeautifulSoup 找不到所有內容

問題描述

1 個解決方案

解決方案1
0 已采納 2021-04-11 00:42:00

BeautifulSoup 找不到所有內容

問題描述

1 個解決方案

解決方案1 0 已采納 2021-04-11 00:42:00

解決方案1
0 已采納 2021-04-11 00:42:00