[英]Using pd.read_html to return a specific table from a webpage of multiple tables
[英]Is there a way to return one specific table from a webpage that has multiple tables in Python?
我無法從此網頁返回一張特定表格(標題為“BRN 大股東”的表格)- https://www.intelligentinvestor.com.au/shares/asx-brn/brainchip-holdings-ltd
我可以使用以下代碼返回所有表格。
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
all_tables = soup.find_all('table')
我嘗試了兩種不同的方法來嘗試使用 bs 進行抓取,但我似乎找不到方法 - 我做錯了什么嗎? 這兩個 output 都是一個空列表。
方法一
# Scrape the substantial holder list
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
sub_headers = []
sub_holdings = []
for found_table in soup.find_all('table', class_=f'{ticker_code} + "Substantial Shareholders"'):
sub_headers = found_table.find_all('th').append(sub_headers)
sub_holdings = found_table.find_all('td').append(sub_holdings)
print(sub_headers)
print(sub_holdings)
方法二
# Scrape the substantial holder list
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
all_headers = soup.find_all("th", class_=f"{ticker_code} Substantial Shareholders")
all_holdings = soup.find_all("tr", class_=f"{ticker_code} Substantial Shareholders")
sub_headers = []
sub_holdings = []
for header in all_headers:
sub_headers.append(header.text)
for holding in all_holdings:
holding.append(sub_holdings.text)
print(sub_headers)
print(sub_holdings)
要只抓取帶有“BRN 大股東”字樣的表格,您可以使用 CSS 選擇器找到該表格:
table = soup.select_one("div:nth-of-type(11) table")
在下面找到了一種更簡單的方法。 sub_table = pd.read_html(current_url, match='Holding') print(sub_table)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.