Beautifulsoup 在具有多个表格的页面中抓取特定表格

Question

import requests
from bs4 import BeautifulSoup

results = requests.get("https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists")

src = results.content

soup = BeautifulSoup(src, 'lxml')

trs = soup.find_all("tr")
for tr in trs:
    print(tr.text)

This is the code I write for the scraping table from the page "https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists"这是我从“https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists”页面为抓取表编写的代码

If I am only targeting the table in the session "List of most Olympic gold medals over career", how can I specify the table I need?如果我只针对“职业生涯最多奥运金牌榜”中的表格，我该如何指定我需要的表格？ There are 2 sortable jquery-tablesorter so I cannot use the class attribute to select the table I needed.有 2 个可排序的jquery-tablesorter所以我不能使用 class 属性来选择我需要的表。

One more question, if I know that the page I am scraping contains a lot of tables and the one I need always have 10 td in 1 row , can I have something like还有一个问题，如果我知道我正在抓取的页面包含很多表格，而我需要的表格总是在 1 row有 10 td ，我可以有类似的东西吗

If len(td) == 10:
print(tr)

to extract the data I wanted提取我想要的数据

Update on code:代码更新：

from bs4 import BeautifulSoup

results = requests.get("https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists")

src = results.content

soup = BeautifulSoup(src, 'lxml')

tbs = soup.find("tbody")
trs = tbs.find_all("tr")
for tr in trs:
    print(tr.text)

I have one of the solution, not a good one, just to extract the first table from the page which is the one I needed, any suggestion/ improvement are welcomed!我有一个解决方案，不是一个好的解决方案，只是从我需要的页面中提取第一个表格，欢迎任何建议/改进！

Thank you.谢谢你。

Answer 1

To only get the first table you can use a CSS Selector nth-of-type(1) :要仅获取第一个表，您可以使用 CSS Selector nth-of-type(1) ：

import requests
from bs4 import BeautifulSoup

URL = "https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")

table = soup.select_one("table.wikitable:nth-of-type(1)")
trs = table.find_all("tr")

for tr in trs:
    print(tr.text)

Beautifulsoup 在具有多个表格的页面中抓取特定表格

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-28 19:40:27

Beautifulsoup 在具有多个表格的页面中抓取特定表格

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-28 19:40:27

解决方案1
1 已采纳 2020-10-28 19:40:27