我怎樣才能在 BeautifulSoup 中只有來自“#0-9 和 AZ”的 select 個鏈接？

Question

我的 URL 是這個

https://en.wikipedia.org/wiki/List_of_South_Korean_dramas

這適用於選擇從 A 到 Z 的所有鏈接。

 link = s.get(url)
    link_soup = BeautifulSoup(link.text, 'lxml')
    links = (
        link_soup
        .select_one('#A')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

但是當我嘗試選擇 select_one #0-9

....

 link_soup
        .select_one('#0-9')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

我收到這個錯誤

SelectorSyntaxError: Malformed id selector at position 0
  line 1:
#0-9
^

我怎樣才能 select 只有來自“#0-9 和 AZ”的鏈接？ 我知道我可以只使用 for 循環並使用 re 更改 URL 的結尾並從那里手動抓取鏈接但是有沒有辦法使用 select 或 bs4 獲得相同的結果。

再次感謝您的幫助。

Answer 1

要回答直接問題，您可以使用 attribute = value css 選擇器來指定 id 屬性及其值。 數字在 "" 之內，因此不會對解析器造成問題。

link_soup.select('[id="0-9"]')

或者使用其 Unicode 代碼點轉義前導數字（在這種情況下不需要后續空格，可以縮寫為 \30）

link_soup.select('#\\30-9')

但是，您可以指定一個模式來提取一個 go 中的所有鏈接，而無需額外的 DOM 上下遍歷。

links = ['https://en.wikipedia.org' + i['href'] for i in link_soup.select('h2:not(:has(#See_also)) + ul a')]

我怎樣才能在 BeautifulSoup 中只有來自“#0-9 和 AZ”的 select 個鏈接？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-06-19 04:57:00

我怎樣才能在 BeautifulSoup 中只有來自“#0-9 和 AZ”的 select 個鏈接？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-06-19 04:57:00

解決方案1
1 已采納 2022-06-19 04:57:00